perm filename V233.TEX[TEX,DEK]2 blob
sn#363573 filedate 1978-06-20 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00029 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00004 00002 \input acphdr % Section 3.3
C00005 00003 %folio 50 galley 14 (C) Addison-Wesley 1978 *
C00020 00004 %folio 53 galley 15 (C) Addison-Wesley 1978 *
C00041 00005 %folio 57 galley 1 (C) Addison-Wesley 1978 *
C00047 00006 %folio 61 galley 2 (C) Addison-Wesley 1978 *
C00060 00007 %folio 64 galley 3 (C) Addison-Wesley 1978 *
C00075 00008 %folio 68 galley 4 (C) Addison-Wesley 1978 *
C00089 00009 %folio 72 galley 5 (C) Addison-Wesley 1978 *
C00103 00010 %folio 75 galley 6 (C) Addison-Wesley 1978 *
C00120 00011 %folio 79 galley 7 (C) Addison-Wesley 1978 *
C00133 00012 %Extra lines set by mistake on galley 1 *
C00142 00013 %folio 83 galley 8 (C) Addison-Wesley 1978 *
C00161 00014 %galley 9 (which was not on paper tape) (C) Addison-Wesley 1978 *
C00177 00015 %folio 88 galley 10 (C) Addison-Wesley 1978 *
C00189 00016 %folio 90 galley 11 (C) Addison-Wesley 1978 *
C00199 00017 %folio 95 galley 12 (C) Addison-Wesley 1978 *
C00210 00018 %folio 98 galley 13 (C) Addison-Wesley 1978 *
C00216 00019 %folio 102 galley 1 (C) Addison-Wesley 1978 *
C00225 00020 %folio 105 galley 2 (C) Addison-Wesley 1978 *
C00237 00021 %folio 113 galley 3 (C) Addison-Wesley 1978 *
C00254 00022 %folio 120 galley 4 (C) Addison-Wesley 1978 *
C00266 00023 %folio 123 galley 5 (C) Addison-Wesley 1978 *
C00280 00024 %folio 126 galley 6 (C) Addison-Wesley 1978 *
C00299 00025 %folio 131 galley 7 (C) Addison-Wesley 1978 *
C00316 00026 % New material not on galleys (C) Addison-Wesley 1978 *
C00335 00027 %folio 135 galley 8 (C) Addison-Wesley 1978 *
C00340 00028 %folio 137 galley 9 (C) Addison-Wesley 1978 *
C00356 00029 \vfill\eject
C00357 ENDMK
C⊗;
\input acphdr % Section 3.3
\runninglefthead{RANDOM NUMBERS} % chapter title
\titlepage\setcpage0
\vfill
\tenpoint
\ctrline{SECTION 3.3 of THE ART OF COMPUTER PROGRAMMING}
\ctrline{$\copyright$ 1978 Addison--Wesley Publishing Company, Inc.}
\vfill
\runningrighthead{STATISTICAL TESTS}
section{3.3}
\eject
\setcpage 38
%folio 50 galley 14 (C) Addison-Wesley 1978 *
\sectionbegin{3.3. STATISTICAL TESTS}
O{\:cUR MAIN PURPOSE} is to obtain sequences which
behave as if they are random. So far we have seen how to make
the period of a sequence so long that for practical purposes
it never will repeat; this is an important criterion, but it
by no means guarantees that the sequence will be useful in applications.
How then are we to decide whether a sequence is sufficiently
random?
If we were to give some man a pencil and paper and ask him to
write down 100 random decimal digits, chances are very slim
that he will give a satisfactory result. People tend to avoid
things which seem nonrandom, such as pairs of equal adjacent
digits (although about one out of every 10 digits should equal
its predecessor). And if we would show someone a table of truly
random digits, he would quite probably tell us they are not
random at all; his eye would spot certain apparent regularities.
According to Dr. I. J. Matrix (as quoted by Martin Gardner in
{\sl Scientific American}, January, 1965), ``Mathematicians
consider the decimal expansion of $π$ a random series, but to
a modern numerologist it is rich with remarkable patterns.''
Dr. Matrix has pointed out, for example, that the first repeated
two-digit number in $π$'s expansion is 26, and its second appearance
comes in the middle of a curious repetition pattern:
$$\eqalignno{\noalign{\vskip 6pt plus 3pt}⊗3.14159265358979323846264338327950⊗(1)\cr
\noalign{\vskip 12pt plus 3pt}}$$ % ARTWORK TO BE ADDED
After listing a dozen or so further properties of these digits, he
observed that $π$, when correctly interpreted, conveys the entire
history of the human race!
We all notice patterns in our telephone numbers, license numbers,
etc., as aids to memory. The point of these remarks is that
we cannot be trusted to judge by ourselves whether a sequence
of numbers is random or not. Some unbiased mechanical tests
must be applied.
The theory of statistics provides us with some quantitative
measures for randomness. There is literally no end to the number
of tests that can be conceived; we will discuss those tests
which have proved to be most useful, most instructive, and most
readily adapted to computer calculation.
If a sequence behaves randomly with respect to tests $T↓1$, $T↓2$,
$\ldotss$, $T↓n\,$, we cannot be {\sl sure} in general that it will
not be a miserable failure when it is subjected to a further
test $T↓{n+1}$; yet each test gives us more and more confidence
in the randomness of the sequence. In practice, we apply about
half a dozen different kinds of statistical tests to a sequence,
and if it passes these satisfactorily we consider it to be random---\!
it is then presumed innocent until proven guilty.
Every sequence which is to be extensively used should be tested
carefully, so the following sections explain how to carry out
these tests in the right way. Two kinds of tests are distinguished:
{\sl empirical tests}, for which the computer manipulates groups
of numbers of the sequence and evaluates certain statistics;
and {\sl theoretical tests}, for which we establish characteristics
of the sequence by using number-theoretic methods based on
the recurrence rule used to form the sequence.
If the evidence doesn't come out as desired, the reader may
wish to try the techniques in {\sl How to Lie With Statistics}
by Darrell Huff (Norton, 1954).
\runningrighthead{GENERAL TEST PROCEDURES FOR STUDYING RANDOM DATA} section{3.3.1}
\sectionskip
\sectionbegin{3.3.1. General Test Procedures for Studying Random Data}
{\bf A. ``Chi-square'' tests.}\xskip The chi-square
test ($\chi ↑2$ test) is perhaps the best known of all statistical
tests, and it is a basic method which is used in connection
with many other tests. Before considering the method in general,
let us consider a particular example of the chi-square test
as it might be applied to dice throwing. Using two ``true''
dice (each of which, independently, is assumed to register 1,
2, 3, 4, 5, or 6 with equal probability), the following table
gives the probability of obtaining a given total, $s$, on a
single throw:
$$\vcenter{
\tabskip 10pt\halign{\rt{#}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#
$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}⊗\ctr{$#$}\cr
value of $s =$⊗
2⊗3⊗4⊗5⊗6⊗7⊗8⊗9⊗10⊗11⊗12\cr
\noalign{\vskip 3pt}
probability, $p↓s =$⊗1\over36⊗1\over18⊗1\over12⊗
1\over9⊗5\over36⊗1\over6⊗5\over36⊗1\over9⊗1\over12⊗1\over18⊗
1\over36\cr}}\eqno(1)$$
(For example, a value of 4 can be thrown in three ways:
$1 + 3$, $2 + 2$, $3 + 1$; this constitutes ${3\over 36} = {1\over 12}
= p↓4$ of the 36 possible outcomes.)
If we throw the dice $n$ times, we should obtain the value $s$
approximately $np↓s$ times on the average. For example, in 144
throws we should get the value 4 about 12 times. The following
table shows what results were {\sl actually} obtained in a particular
sequence of 144 throws of the dice:
$$\vcenter{
\tabskip 8pt\halign{\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt
{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}\cr
value of $s =$⊗2⊗3⊗4⊗5⊗6⊗7⊗8⊗9⊗10⊗11⊗12\cr
\noalign{\vskip 3pt}
observed number, $Y↓s =$⊗2⊗4⊗10⊗12⊗22⊗29⊗21⊗15⊗14⊗9⊗6\cr
\noalign{\vskip 3pt}
expected number, $np↓s =$⊗4⊗8⊗12⊗16⊗20⊗24⊗20⊗16⊗12⊗8⊗4\cr}}\eqno(2)$$
Note that the observed number is different from the
expected number in all cases; in fact, random throws of the
dice will hardly ever come out with {\sl exactly} the right
frequencies. There are $36↑{144}$ possible sequences of 144 throws,
all of which are equally likely. One of these sequences consists
of all 2's (``snake eyes''), and anyone throwing 144 snake eyes
in a row would be convinced that the dice were loaded. Yet the
sequence of all 2's is just as probable as any other particular
sequence if we specify the outcome of each throw of each die.
In view of this, how can we test whether or not a given pair
of dice is loaded? The answer is that we can't make a definite
yes-no statement, but we can give a {\sl probabilistic} answer.
We can say how probable or improbable certain types of events
are.
A fairly natural way to proceed in the above example is to consider
the squares of the differences between the observed numbers
$Y↓s$ and the expected numbers $np↓s$. We can add
these together, obtaining
$$V = (Y↓2 - np↓2)↑2 + (Y↓3 - np↓3)↑2 + \cdots +
(Y↓{12} - np↓{12})↑2.\eqno (3)$$ A bad set of dice
should result in a relatively high value of $V$; and for any
given value of $V$ we can ask, ``What is the probability that
$V$ is this high, using true dice?'' If this probability is
very small, say ${1}\over{100}$, we would know that only about one
time in 100 would true dice give results so far away from the
expected numbers, and we would have definite grounds for suspicion.
(Remember, however, that even {\sl good} dice would give such
a high value of $V$ about one time in a hundred, so a cautious person
would repeat the experiment to see if the high value of $V$
is repeated.)
The statistic $V$ in (3) gives equal weight to $(Y↓7 - np↓7)↑2$
and $(Y↓2 - np↓2)↑2$, although $(Y↓7 - np↓7)↑2$ is likely to
be a good deal higher than $(Y↓2 - np↓2)↑2$ since 7's occur
about six times as often as 2's. It turns out that the ``right''
statistic, at least one which has proved to be most important,
will give $(Y↓7 - np↓7)↑2$ only ${1 \over 6}$ as much weight
as $(Y↓2 - np↓2)↑2$, and we should change (3) to the following
formula:
$$V = {(Y↓2 - np↓2)↑2 \over np↓2} + {(Y↓3 - np↓3)↑2 \over np↓3}
+ \cdots + {(Y↓{12} - np↓{12})↑2 \over np↓{12}}.\eqno(4)$$
This is called the ``chi-square'' statistic
of the observed quantities $Y↓2, \ldots, Y↓{12}$ in this dice-throwing
experiment. For the data in (2), we find that
$$V = {(2 - 4)↑2 \over 4} + {(4 - 8)↑2 \over 8} + \cdots
+ {(9 - 8)↑2 \over 8} + {(6 - 4)↑2 \over 4}
= 7{7 \over 48}.\eqno (5)$$ The important question
now is, of course, ``does $7{7 \over 48}$ constitute an improbably
high value for $V$ to assume?'' Before answering this question,
let us consider the general application of the chi-square method.
In general, suppose that every observation can fall into one
of $k$ categories. We take $n$ {\sl independent observations}\/;
this means that the outcome of one observation has absolutely
no effect on the outcome of any of the others. Let $p↓s$ be
the probability that each observation falls into category $s$,
and let $Y↓s$ be the number of observations which actually {\sl do}
fall into category $s$. We form the statistic
$$V = \sum ↓{1≤s≤k} {(Y↓s - np↓s)↑2 \over np↓s}.\eqno(6)$$
In our example above, there are eleven possible
outcomes of each throw of the dice, so $k = 11$. $\biglp$Eq.\ (6) is
a slight change of notation from Eq.\ (4), since we are numbering
the possibilities from 1 to $k$ instead of from 2 to 12.$\bigrp$
By expanding $(Y↓s - np↓s)↑2 = Y↑{2}↓{s} - 2np↓sY↓s + n↑2p↑{2}↓{s}$
in (6) and using the facts that
$$\eqalign{Y↓1 + Y↓2 + \cdots + Y↓k ⊗= n,\cr
p↓1 + p↓2 + \cdots + p↓k ⊗= 1,\cr}\eqno(7)$$
we arrive at the formula
$$V = {1 \over n} \sum ↓{1≤s≤k} \left({Y↑2↓s \over p↓s}\right) - n,\eqno (8)$$
which often makes the computation of $V$ somewhat easier.
%folio 53 galley 15 (C) Addison-Wesley 1978 *
\yskip Now we turn to the important question, what
constitutes a reasonable value of $V?$ This is found by referring
to a table such as Table 1, which gives values of ``the chi-square
distribution with $\nu$ degrees of freedom'' for various values
of $\nu $. The line of the table with $\nu = k - 1$ is to be
used; {\sl the number of ``degrees of freedom'' is $k - 1$, one
less than the number of categories.} $\biglp$Intuitively, this
means that $Y↓1, Y↓2, \ldots, Y↓k$ are not completely independent,
since Eq.\ (7) shows that $Y↓1$ can be computed if $Y↓2, \ldotss, Y↓k$
are known; hence, $k - 1$ degrees of freedom are present. This
argument is not rigorous, but the theory below justifies it.$\bigrp$
\topinsert{\tablehead{Table 1}
\ctrline{SELECTED PERCENTAGE POINTS OF THE CHI-SQUARE DISTRIBUTION}
\vskip 4 pt plus 3 pt minus 2 pt
\def\\{\vrule height 11.1667pt depth 5.5pt}
\def\0{\hskip 4.5pt} % width of digits in 9pt type
\baselineskip0pt\lineskip0pt
\hrule
\halign{\hjust to 2.375pc{\\\rt{$#\;$}}⊗\hjust to 1.25pc{\lft{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}\cr
⊗⊗$p=1\%$⊗$p=5\%$⊗$p=25\%$⊗$p=50\%$⊗$p=75\%$⊗$p=95\%$⊗$p=99\%$\cr
\noalign{\hrule}
\nu =⊗1⊗\00.00016⊗\00.00393⊗\00.1015⊗\00.4549⊗\01.323⊗\03.841⊗\06.635\cr
\noalign{\hrule}
\nu =⊗2⊗\00.02010⊗\00.1026\0⊗\00.5753⊗\01.386\0⊗\02.773⊗\05.991⊗\09.210\cr
\noalign{\hrule}
\nu =⊗3⊗\00.1148\0⊗\00.3518\0⊗\01.213\0⊗\02.366\0⊗\04.108⊗\07.815⊗11.34\0\cr
\noalign{\hrule}
\nu =⊗4⊗\00.2971\0⊗\00.7107\0⊗\01.923\0⊗\03.357\0⊗\05.385⊗\09.488⊗13.28\0\cr
\noalign{\hrule}
\nu =⊗5⊗\00.5543\0⊗\01.1455\0⊗\02.675\0⊗\04.351\0⊗\06.626⊗11.07\0⊗15.09\0\cr
\noalign{\hrule}
\nu =⊗6⊗\00.8720\0⊗\01.635\0\0⊗\03.455\0⊗\05.348\0⊗\07.841⊗12.59\0⊗16.81\0\cr
\noalign{\hrule}
\nu =⊗7⊗\01.239\0\0⊗\02.167\0\0⊗\04.255\0⊗\06.346\0⊗\09.037⊗14.07\0⊗18.48\0\cr
\noalign{\hrule}
\nu =⊗8⊗\01.646\0\0⊗\02.733\0\0⊗\05.071\0⊗\07.344\0⊗10.22\0⊗15.51\0⊗20.09\0\cr
\noalign{\hrule}
\nu =⊗9⊗\02.088\0\0⊗\03.325\0\0⊗\05.899\0⊗\08.343\0⊗11.39\0⊗16.92\0⊗21.67\0\cr
\noalign{\hrule}
\nu =⊗10⊗\02.558\0\0⊗\03.940\0\0⊗\06.737\0⊗\09.342\0⊗12.55\0⊗18.31\0⊗23.21\0\cr
\noalign{\hrule}
\nu =⊗11⊗\03.053\0\0⊗\04.575\0\0⊗\07.584\0⊗10.34\0\0⊗13.70\0⊗19.68\0⊗24.73\0\cr
\noalign{\hrule}
\nu =⊗12⊗\03.571\0\0⊗\05.226\0\0⊗\08.438\0⊗11.34\0\0⊗14.84\0⊗21.03\0⊗26.22\0\cr
\noalign{\hrule}
\nu =⊗15⊗\05.229\0\0⊗\07.261\0\0⊗11.04\0\0⊗14.34\0\0⊗18.25\0⊗25.00\0⊗30.58\0\cr
\noalign{\hrule}
\nu =⊗20⊗\08.260\0\0⊗10.85\0\0\0⊗15.45\0\0⊗19.34\0\0⊗23.83\0⊗31.41\0⊗37.57\0\cr
\noalign{\hrule}
\nu =⊗30⊗14.95\0\0\0⊗18.49\0\0\0⊗24.48\0\0⊗29.34\0\0⊗34.80\0⊗43.77\0⊗50.89\0\cr
\noalign{\hrule}
\nu =⊗50⊗29.71\0\0\0⊗34.76\0\0\0⊗42.94\0\0⊗49.33\0\0⊗56.33\0⊗67.50\0⊗76.15\0\cr
\noalign{\hrule}
\noalign{\hjust{\hjust to 2.375pc{\\\rt{$\nu>\;$}}\hjust to 1.25pc{\lft{30}\\}\!
\hjust to 25.375pc{\ctr{$\nu+\sqrt{2\nu}x↓p+{2\over3}x↓p↑2-{2\over3}+
O\left(1/\sqrt\nu\,\right)$}\\}}}
\noalign{\hrule}
x↓p⊗$=$⊗$-2.33$⊗$-1.64$⊗$-.675$⊗0.00⊗0.675⊗1.64⊗2.33\cr}
\hrule
\vskip 6 pt plus 3 pt minus 2 pt
\eightpoint\hjust to size{
(For further values, see {\sl
Handbook of Mathematical Functions}, ed.\ by M. Abramowitz and
I. A. Stegun (Washington, D.C.: U.S. Government Printing Office, 1964),
Table 26.8.)} %end of \hjust to size
} %end of \topinsert
If the table entry in row $\nu$ under column $p$ is $x$, it
means, ``The quantity $V$ in Eq.\ (8) will be less than or equal to $x$
with approximate probability $p$, if $n$ is
large enough.'' For example, the 95 percent entry in
row 10 is 18.31; this says we will have $V > 18.31$ only about 5 percent
of the time.
Let us assume that the above dice-throwing experiment is simulated
on a computer using some sequence of supposedly random numbers,
with the following results:
$$\vcenter{
\tabskip 8pt\halign{\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt
{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}\cr
value of $s =$⊗2⊗3⊗4⊗5⊗6⊗7⊗8⊗9⊗10⊗11⊗12\cr
\noalign{\vskip 3pt}
Experiment 1,\quad$Y↓s =$⊗4⊗10⊗10⊗13⊗20⊗18⊗18⊗11⊗13⊗14⊗13\cr
\noalign{\vskip 3pt}
Experiment 2,\quad $Y↓s =$⊗3⊗7⊗11⊗15⊗19⊗24⊗21⊗17⊗13⊗9⊗5\cr}}\eqno(9)$$
We can compute the chi-square statistic in the first
case, getting the value $V↓1 = 29{59 \over 120}$, and in the second
case we get $V↓2 = 1{17 \over 120}$. Referring to
the table entries for 10 degrees of freedom, we see that $V↓1$
is {\sl much too high}\/; $V$ will be greater than 23.21 only about
one percent of the time! (By using more extensive tables, we
find in fact that $V$ will be as high as $V↓1$ only 0.1 percent
of the time.) Therefore Experiment 1 represents a significant
departure from random behavior.
On the other hand, $V↓2$ is quite low, since the observed values $Y↓s$ in Experiment
2 are quite close to the expected values $np↓s$ in (2). The
chi-square table tells us, in fact, that
$V↓2$ is {\sl much too low}\/\hskip .5pt: the observed
values are so close to the expected values, we cannot consider
the result to be random! (Indeed, reference to other tables
shows that such a low value of $V$ occurs only 0.03 percent
of the time when there are 10 degrees of freedom.) Finally,
the value $V = 7{7 \over 48}$ computed in (5) can also be checked
with Table 1. It falls between the entries for
25 percent and 50 percent, so we cannot consider it to be significantly
high or significantly low; thus the observations in (2) are satisfactorily
random with respect to this test.
It is somewhat remarkable that the same table entries are used
no matter what the value of $n$ is, and no matter what the probabilities
$p↓s$ are. Only the number $\nu = k - 1$ affects the results.
In actual fact, however, the table entries are not exactly correct: {\sl
the chi-square distribution is an approximation which is valid
only for large enough values of $n$.} How large should $n$ be?
A common rule of thumb is to take $n$ large enough so that each
of the expected values $np↓s$ is five or more; preferably, however,
take $n$ much larger than this, to get a more powerful test.
Note that in our examples above we took $n = 144$, so $np↓2$
was only 4, violating the stated ``rule of thumb.'' This was
done only because the author tired of throwing the dice; it
makes the entries in Table 1 less accurate for our application.
Experiments run on a computer, with $n = 1000$, or 10000, or
even 100000, would be much better than this. We could also combine
the data for $s = 2$ and $s = 12$; then the test would have only nine
degrees of freedom but the chi-square approximation would be more accurate.
We can get an idea of how crude an approximation is involved by considering the
case when there are only two categories, having respective probabilities
$p↓1$ and $p↓2$. Suppose $p↓1={1\over4}$ and $p↓2={3\over4}$. According to
the stated rule of thumb, we should have $n≥20$ to have a satisfactory
approximation, so let's check this out. When $n=20$, the possible values of
$V$ are ${4\over15}r↑2$ for $-5≤r≤15$; we wish to know how well the
$\nu=1$ row of Table 1 describes the distribution of $V$. The chi-square
distribution varies continuously, while the actual distribution of $V$ has
rather big jumps, so we need some convention for representing the exact
distribution. If the distinct possible outcomes of the experiment lead to the
values $V↓0≤V↓1≤\cdots≤V↓n$ with respective probabilities $π↓0$, $π↓1$, $\ldotss$,
$π↓n$, let us represent percentage point $p$ by the pair $[V↓j,V↓{j+1}]$ when
$π↓0+\cdots+π↓j≤p<π↓0+\cdots+π↓j+π↓{j+1}$. Then the percentage points of the
exact distribution for $n=20$, corresponding to the approximations in Table 1
for $p=1\%$, $5\%$, $25\%$, $50\%$, $75\%$, $95\%$, and $99\%$, respectively, are
$$[0,0],\ [0,0],\ [0,.27],\ [.27,.27],\ [1.07,1.07],\ [2.4,4.27],\ [6.67,6.67].$$
For example, this tableau says that the relevant value of $V$ when $p=75\%$ is
1.07, while Table 1 gives the estimate 1.323. (More precisely, one can verify that
for $n=20$ the probability is about .56 that $V≤{4\over15}$, and the next
higher value that $V$ can assume is ${16\over15}=1.07$; this value can occur in
two ways having probabilities about .13 and .11, so the cumulative probability
for $V≤1.07$ straddles .75, as required.) When $n=21$, the situation changes
slightly because the expected values $np↓1=5.25$ and $np↓2=15.75$ can never
be obtained exactly; the tableau for $n=21$ is
$$[-,.02],\ [-,.02],\ [.02,.14],\ [.14,.40],\ [.78,1.29],\ [2.68,3.57],\ [4.59,
5.73].$$
We would expect Table 1 to be a better approximation when $n=50$, but the
corresponding tableau actually turns out to be further from Table 1 in some
respects than it was for $n=20$:
$$[-,.03],\ [-,.03],\ [.03,.03],\ [.24,.67],\ [1.31,1.31],\ [3.23,3.23],\ [6.0,
6.0].$$ Here are the values when $n=300$:
$$[0,0],\ [0,0],\ [.07,.07],\ [.44,.44],\ [1.14,1.44],\ [3.48,4.0],\ [6.42,
6.42].$$
Even in this case, when $np↓s$ is $≥75$ in each category, the entries in
Table 1 are good to only about one significant digit.
The proper choice of $n$ is somewhat obscure. If the dice are
actually biased, the fact will be detected as $n$ gets larger
and larger. (Cf.\ exercise 12.) But large values of $n$ will
tend to smooth out {\sl locally} nonrandom behavior, i.e., blocks
of numbers with a strong bias followed by blocks of numbers
with the opposite bias. This type of behavior would not happen
when actual dice are rolled, since the same dice are used throughout
the test, but a sequence of numbers generated
on a computer might very well display such locally nonrandom
behavior. Perhaps a chi-square test should be made for a number
of different values of $n$. At any rate, $n$ should always be
rather large.
\yskip We can summarize the
chi-square test as follows. A fairly large number, $n$, of independent
observations is made. (It is important to avoid using the chi-square
method unless the observations are independent. See, for example,
exercise 10, which considers the case when half of the observations
depend on the other half.) We count the number of observations
falling into each of $k$ categories and compute the quantity
$V$ given in Eqs.\ (6) and (8). Then $V$ is compared with the
numbers in Table 1, with $\nu = k - 1$. If $V$ is less than
the $1\%$ entry or greater than the $99\%$ entry,
we reject the numbers as not sufficiently random. If $V$ lies
between the $1\%$ and $5\%$ entries or between the $95\%$ and $99\%$ entries,
the numbers are ``suspect''; if (by interpolation in the table) $V$ lies
between the $5\%$ and $10\%$ entries, or the $90\%$ and $95\%$ entries, the numbers
might be ``almost suspect.'' The chi-square test is often done
at least three times on different sets of data, and if at least
two of the three results are suspect the numbers are regarded
as not sufficiently random.
\topinsert{\vskip 58mm
\ctrline{\caption Fig. 2. Indications of ``significant''
deviations in 90 chi-square tests (cf.\ also Fig.\ 5).}}
For example, see Fig.\ 2, which shows schematically
the results of applying five different types of chi-square tests
on each of six sequences of random numbers. Each test has been
applied to three different blocks of numbers of the sequence.
Generator A is the MacLaren--Marsaglia method (Algorithm 3.2.2M
applied to the sequences in 3.2.2--12),
Generator E is the Fibonacci method, and the other generators
are linear congruential sequences with the following parameters:
$$\halign{\rt{#}⊗\quad\lft{$#$}\cr
Generator B:⊗X↓0 = 0,\quad a = 3141592653,\quad
c = 2718281829,\quad m = 2↑{35}.\cr
\noalign{\yskip}
Generator C:⊗X↓0 = 0,\quad a = 2↑7 + 1,\quad
c = 1,\quad m = 2↑{35}.\cr
\noalign{\yskip}
Generator D:⊗X↓0 = 47594118,\quad a = 23,\quad
c = 0,\quad m = 10↑8 + 1.\cr
\noalign{\yskip}
Generator F:⊗X↓0 = 314159265,\quad a = 2↑{18} + 1,\quad
c = 1,\quad m = 2↑{35}.\cr}$$
From Fig.\ 2 we conclude that (so far as these tests
are concerned) Generators A, B, D are satisfactory, Generator
C is on the borderline and should probably be rejected, Generators
E and F are definitely unsatisfactory. Generator F has, of course,
low potency; Generators C and D have been discussed in the literature,
but their multipliers are too small. (Generator D is the original
multiplicative generator proposed by Lehmer in 1948; Generator
C is the original linear congruential generator with $c ≠ 0$
proposed by Rotenberg in 1960.)
Instead of using the ``suspect,'' ``almost suspect,'' etc.,
criteria for judging the results of chi-square tests, there
is a less {\sl ad hoc} procedure available, which will be discussed
later in this section.
%folio 57 galley 1 (C) Addison-Wesley 1978 *
\topinsert{\vskip 81mm
\ctrline{\caption Fig. 3. Examples of distribution functions.}}
\subsectionbegin{B. The Kolmogorov--\hskip 1pt Smirnov test}
As we have seen, the chi-square test applies to the situation
when observations can fall into a finite number of categories,
$k$. It is not unusual, however, to consider random quantities
which may assume infinitely many values. For example, a random
real number between 0 and 1 may take on infinitely many values;
even though only a finite number of these can be represented
in the computer, we want our random values to behave essentially
as though though they are random real numbers.
A general notation for specifying probability distributions,
whether they are finite or infinite, is commonly used in the
study of probability and statistics. Suppose we want to specify
the distribution of the values of a random quantity, $X$; we
do this in terms of the {\sl distribution function $F(x)$}, where
$$F(x) = \hjust{probability that }(X ≤ x).$$
Three examples are shown in Fig.\ 3. First we see
the distribution function for a {\sl random bit}, i.e., for
the case when $X$ takes on only the two values 0 and 1, each
with probability ${1\over 2}$. Part (b) of the figure shows
the distribution function for a {\sl uniformly distributed random
real number} between zero and one, so the probability that $X
≤ x$ is simply equal to $x$ when $0 ≤ x ≤ 1$; for example, the
probability that $X ≤ {2\over 3}$ is, naturally, ${2\over 3}$.
And part (c) shows the limiting distribution of the value $V$
in the chi-square test (shown here with 10 degrees of freedom);
this is a distribution which we have already seen represented
in another way in Table 1. Note that $F(x)$ always increases
from 0 to 1 as $x$ increases from $-∞$ to $+∞$.
\topinsert{\vskip 516pt
\ctrline{\caption Fig. 4. Examples of empirical distributions.}}
%folio 61 galley 2 (C) Addison-Wesley 1978 *
If we make $n$ independent observations of the random
quantity $X$, thereby obtaining the values $X↓1$, $X↓2$, $\ldotss$,
$X↓n\,$, we can form the {\sl empirical distribution $F↓n(x)$},
where
$$F↓n(x) =
{\raise 3pt\hjust{$\hjust{number of }X↓1, X↓2, \ldotss, X↓n\hjust{ which are} ≤x$}
\over n} .\eqno (10)$$
Figure 4 illustrates three empirical distribution
functions $\biglp$shown as zigzag lines, although strictly speaking
the vertical lines are not part of the graph of $F↓n(x)\bigrp$,
superimposed on a graph of the assumed actual distribution
function $F(x)$. As $n$ gets large, $F↓n(x)$ should be a better
and better approximation to $F(x)$.
The Kolmogorov--\hskip 1pt Smirnov test (KS test) may be used when $F(x)$
has no jumps. It is based on the {\sl difference between $F(x)$ and
$F↓n(x)$}. A bad source of random numbers will give empirical distribution
functions that do not approximate $F(x)$ sufficiently well.
Figure 4(b) shows an example in which the $X↓i$ are consistently
too high, so the empirical distribution function is too low.
Part (c) of the figure shows an even worse example; it is plain
that such great deviations between $F↓n(x)$ and $F(x)$ are extremely
improbable, and the KS test is used to tell us how improbable
they are.
To make the test, we form the following statistics:
$$\eqalign{K↑{+}↓{n}⊗=\sqrt n\max↓{-∞<x<+∞}\biglp F↓n(x)
- F(x)\bigrp;\cr
K↑{-}↓{n} ⊗=\sqrt n\max↓{-∞<x<+∞}\biglp F(x)
- F↓n(x)\bigrp.\cr}\eqno(11)$$
Here $K↑{+}↓{n}$ measures the greatest amount of deviation
when $F↓n$ is greater than $F$, and $K↑{-}↓{n}$ measures the
maximum deviation when $F↓n$ is less than $F$. The statistics
for the examples of Fig.\ 4 are:
$$\vcenter{\halign{\rt{$# $}\qquad⊗\ctr{#}\qquad⊗\ctr{#}\qquad⊗\ctr{#}\cr
⊗Part (a)⊗Part (b)⊗Part (c)\cr
\noalign{\vskip 4pt}
K↑{+}↓{20}⊗0.492⊗0.134⊗0.313\cr
\noalign{\vskip 3pt}
K↑{-}↓{20}⊗0.536⊗1.027⊗2.101\cr}}\eqno\lower 7.5pt\hjust{(12)}$$
({\sl Note:} The factor $\sqrt{n}$ which appears
in Eqs.\ (11) may seem puzzling at first. Exercise 6 shows that,
for fixed $x$, the standard deviation of $F↓n(x)$ is proportional
to $1/\sqrt n$; hence the factor $\sqrt{n}$ magnifies the
statistics $K↑{+}↓{n}$, $K↑{-}↓{n}$ in such a way that this standard
deviation is independent of $n$.)
As in the chi-square test, we may now look up the values
$K↑{+}↓{n}$, $K↑{-}↓{n}$ in a ``percentile'' table to determine
if they are significantly high or low. Table 2 may be used for
this purpose, both for $K↑{+}↓{n}$ and $K↑{-}↓{n}$. For example,
the probability is 75 percent that $K↑-↓{20}$ will be 0.7975 or less.
Unlike the chi-square test, the table entries are
{\sl not} merely approximations which hold for large values
of $n$; the table gives exact values (except, of course, for roundoff
error), and the KS
test may be reliably used for any value of $n$.
\topinsert{\tablehead{Table 2}
\ctrline{SELECTED PERCENTAGE POINTS OF THE DISTRIBUTIONS $K↑+↓n$ AND $K↑-↓n$}
\vskip 4 pt plus 3 pt minus 2 pt
\def\\{\vrule height 11.1667pt depth 5.5pt}
\def\0{\hskip 4.5pt} % width of digits in 9pt type
\baselineskip0pt\lineskip0pt
\hrule
\halign{\hjust to 2.375pc{\\\rt{$#\;$}}⊗\hjust to 1.25pc{\lft{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}⊗\!
\hjust to 3.625pc{\ctr{#}\\}\cr
⊗⊗$p=1\%$⊗$p=5\%$⊗$p=25\%$⊗$p=50\%$⊗$p=75\%$⊗$p=95\%$⊗$p=99\%$\cr
\noalign{\hrule}
n =⊗1⊗0.01000⊗0.05000⊗0.2500⊗0.5000⊗0.7500⊗0.9500⊗0.9900\cr
\noalign{\hrule}
n =⊗2⊗0.01400⊗0.06749⊗0.2929⊗0.5176⊗0.7071⊗1.0980⊗1.2728\cr
\noalign{\hrule}
n =⊗3⊗0.01699⊗0.07919⊗0.3112⊗0.5147⊗0.7539⊗1.1017⊗1.3589\cr
\noalign{\hrule}
n =⊗4⊗0.01943⊗0.08789⊗0.3202⊗0.5110⊗0.7642⊗1.1304⊗1.3777\cr
\noalign{\hrule}
n =⊗5⊗0.02152⊗0.09471⊗0.3249⊗0.5245⊗0.7674⊗1.1392⊗1.4024\cr
\noalign{\hrule}
n =⊗6⊗0.02336⊗0.1002\0⊗0.3272⊗0.5319⊗0.7703⊗1.1463⊗1.4144\cr
\noalign{\hrule}
n =⊗7⊗0.02501⊗0.1048\0⊗0.3280⊗0.5364⊗0.7755⊗1.1537⊗1.4246\cr
\noalign{\hrule}
n =⊗8⊗0.02650⊗0.1086\0⊗0.3280⊗0.5392⊗0.7797⊗1.1586⊗1.4327\cr
\noalign{\hrule}
n =⊗9⊗0.02786⊗0.1119\0⊗0.3274⊗0.5411⊗0.7825⊗1.1624⊗1.4388\cr
\noalign{\hrule}
n =⊗10⊗0.02912⊗0.1147\0⊗0.3297⊗0.5426⊗0.7845⊗1.1658⊗1.4440\cr
\noalign{\hrule}
n =⊗11⊗0.03028⊗0.1172\0⊗0.3330⊗0.5439⊗0.7863⊗1.1688⊗1.4484\cr
\noalign{\hrule}
n =⊗12⊗0.03137⊗0.1193\0⊗0.3357⊗0.5453⊗0.7880⊗1.1714⊗1.4521\cr
\noalign{\hrule}
n =⊗15⊗0.03424⊗0.1244\0⊗0.3412⊗0.5500⊗0.7926⊗1.1773⊗1.4606\cr
\noalign{\hrule}
n =⊗20⊗0.03807⊗0.1298\0⊗0.3461⊗0.5547⊗0.7975⊗1.1839⊗1.4698\cr
\noalign{\hrule}
n =⊗30⊗0.04354⊗0.1351\0⊗0.3509⊗0.5605⊗0.8036⊗1.1916⊗1.4801\cr
\noalign{\hrule}
\noalign{\hjust{\hjust to 2.375pc{\\\rt{$n>\;$}}\hjust to 1.25pc{\lft{30}\\}\!
\hjust to 25.375pc{\ctr{$y↓p-1\left/\left(6\sqrt n\,\right)\right. + O(1/n),
\hjust{ where }y↓p↑2={1\over2}\ln(1/(1-p))$}\\}}}
\noalign{\hrule}
y↓p⊗$=$⊗0.07089⊗0.1601⊗0.3793⊗0.5887⊗0.8326⊗1.2239⊗1.5174\cr}
\hrule}
As they stand, formulas
(11) are not readily adapted to computer calculation, since
we are asking for a maximum over infinitely many values of $x$.
From the fact that $F(x)$ is increasing and the fact that $F↓n(x)$
increases only in finite steps, however, we can derive a
simple procedure for evaluating the statistics $K↑{+}↓{n}$ and
$K↑{-}↓{n}$:
\yskip\noindent{\sl Step 1.}\hskip 7pt
Obtain the observations $X↓1, X↓2, \ldotss ,
X↓n\,$.
\yskip\noindent{\sl Step 2.}\hskip 7pt
Rearrange the observations so that
they are sorted into ascending order, i.e., so that $X↓1 ≤ X↓2
≤ \cdots ≤ X↓n$. (Efficient sorting algorithms are the subject
of Chapter 5. On the other hand, it is possible to avoid sorting
in this particular application, as shown in exercise 23.)
\yskip\noindent{\sl Step 3.}\hskip 7pt
The desired statistics are now given by the formulas
$$\eqalign{
K↑{+}↓{n}⊗=\sqrt n \max↓{1≤j≤n}\left({j\over n} - F(X↓j)\right);\cr
K↑{-}↓{n}⊗=\sqrt n \max↓{1≤j≤n}\left(F(X↓j) - {j - 1\over n}\right).\cr}\eqno(13)$$
\yskip
An appropriate choice of the number of observations,
$n$, is slightly easier to make for this test than it is for
the $\chi↑2$ test, although some of the considerations are
similar. If the random variables $X↓j$ actually belong to the
probability distribution $G(x)$, while they were assumed to
belong to the distribution given by $F(x)$, it will take a comparatively
large value of $n$ to reject the hypothesis that $G(x) = F(x)$;
for we need $n$ large enough that the empirical distributions
$G↓n(x)$ and $F↓n(x)$ are expected to be observably different.
On the other hand, large values of $n$ will tend to average
out locally nonrandom behavior, and such behavior is an undesirable
characteristic that is of significant importance in most computer
applications of random numbers; this makes a case for {\sl smaller}
values of $n$. A good compromise would be to
take $n$ equal to, say, 1000, and to make a fairly large number
of calculations of $K↑{+}↓{1000}$ on different parts of a random
sequence, thereby obtaining values
$$K↑{+}↓{1000}(1),\qquad K↑{+}↓{1000}(2),\qquad \ldotss ,\qquad
K↑{+}↓{1000}(r).\eqno (14)$$
We can also apply the KS test {\sl again} to {\sl
these} results: Let $F(x)$ now be the distribution function
for $K↑{+}↓{1000}$, and determine the empirical distribution
$F↓r(x)$ obtained from the observed values in (14). Fortunately,
the function $F(x)$ in this case is very simple; for a large
value of $n$ like $n = 1000$, the distribution of $K↑{+}↓{n}$
is closely approximated by
$$F↓∞(x) = 1 - e↑{-2x↑2},\qquad x ≥ 0.\eqno (15)$$
The same remarks apply to $K↑{-}↓{n}$, since $K↑{+}↓{n}$
and $K↑{-}↓{n}$ have the same expected behavior. {\sl This method
of using several tests for moderately large n, then combining
the observations later in another KS test, will tend to detect
both local and global nonrandom behavior.}
%folio 64 galley 3 (C) Addison-Wesley 1978 *
An experiment of this type (although on a much smaller
scale) was made by the author as this chapter was being written.
The ``maximum of 5'' test described in the next section was
applied to a set of 1000 uniform random numbers, yielding 200
observations $X↓1$, $X↓2$, $\ldotss$, $X↓{200}$ which were supposed
to belong to the distribution $F(x) = x↑5\;(0 ≤ x ≤ 1)$. The
observations were divided into 20 groups of 10 each, and the
statistic $K↑{+}↓{10}$ was computed for each group. The 20 values
of $K↑{+}↓{10}$, thus obtained, led to the empirical distributions
shown in Fig.\ 4. The smooth curve shown in each of the diagrams
in Fig.\ 4 is the actual distribution the statistic $K↑{+}↓{10}$
should have. Figure 4(a) shows the empirical distribution of
$K↑{+}↓{10}$ obtained from the sequence
$$Y↓{n+1} = (3141592653Y↓n + 2718281829)\mod 2↑{35},\qquad
U↓n = Y↓n/2↑{35},$$
and it is satisfactorily random. Part (b) of the
figure came from the Fibonacci method; this sequence has {\sl
globally} nonrandom behavior, i.e., it can be shown that the
observations $X↓n$ in the ``maximum of 5'' test do not have
the correct distribution $F(x) = x↑5$. Part (c) came from the
notorious and impotent linear congruential sequence $Y↓{n+1}
= \left( (2↑{18} + 1)Y↓n + 1\right)\mod 2↑{35}$, $U↓n = Y↓n/2↑{35}$.
The KS test applied to the data in Fig.\ 4 gives the results
shown in (12). Referring to Table 2 for $n = 20$, we see that
the values of $K↑{+}↓{20}$ and $K↑{-}↓{20}$ for Fig.\ 4(b) are
almost suspect (they lie at about the 5 percent and 88 percent
levels) but not quite bad enough to be rejected outright. The
value of $K↑{-}↓{20}$ for Part (c) is, of course, completely
out of line, so the ``maximum of 5'' test shows a definite failure
of that random-number generator.
We would expect the KS test in this experiment to have more
difficulty locating global nonrandomness than local nonrandomness,
since the basic observations in Fig.\ 4 were made on samples
of only 10 numbers each. If we were to take 20 groups of
1000 numbers each, part (b) would show a much more significant
deviation. To illustrate this point, a {\sl single} KS test
was applied to all 200 of the observations which led to Fig.\
4, and the following results were obtained:
$$\vcenter{\halign{\rt{$# $}\qquad⊗\ctr{#}\qquad⊗\ctr{#}\qquad⊗\ctr{#}\cr
⊗Part (a)⊗Part (b)⊗Part (c)\cr
\noalign{\vskip 4pt}
K↑{+}↓{200}⊗0.477⊗1.537⊗2.819\cr
\noalign{\vskip 3pt}
K↑{-}↓{200}⊗0.817⊗0.194⊗0.058\cr}}\eqno\lower 7.5pt\hjust{(16)}$$
The global nonrandomness of the Fibonacci generator has
definitely been detected here.
\yskip We may summarize the Kolmogorov--\hskip 1pt Smirnov test as follows. We
are given $n$ {\sl independent observations} $X↓1$, $\ldotss$, $X↓n$
taken from some distribution specified by a {\sl continuous}
function $F(x)$. That is, $F(x)$ must be like the functions
shown in Fig.\ 3(b) and 3(c), having no jumps like those in Fig.\
3(a). The procedure explained just before Eqs.\ (13) is carried
out on these observations, so we obtain the statistics $K↑{+}↓{n}$
and $K↑{-}↓{n}$. These statistics should be distributed according
to Table 2.
Some comparisons between the KS test and the $\chi ↑2$ test
can now be made. In the first place, we should
observe that the KS test may be used in conjunction with
the $\chi ↑2$ test, to give a better procedure than the {\sl
ad hoc} method we mentioned when summarizing the $\chi ↑2$ test.
(That is, there is a better way to proceed than to make three
tests and to consider how many of the results were ``suspect'').
Suppose we have made, say, 10 independent $\chi ↑2$ tests on
different parts of a random sequence, so that values $V↓1$, $V↓2$,
$\ldotss$, $V↓{10}$ have been obtained. It is not a good policy
simply to count how many of the $V$'s are suspiciously large
or small. This procedure will work in extreme cases,
and very large or very small values may mean that the
sequence has too much local nonrandomness; but a better general
method would be to plot the empirical distribution of these 10 values
and to compare it to the correct distribution, which may
be obtained from Table 1. This would give a clearer picture of the
results of the $\chi ↑2$ tests, and in fact the statistics $K↑{+}↓{10}$
and $K↑{-}↓{10}$ could be determined as an indication of the success
or failure. With only 10 values or even as many as 100 this
could all be done easily by hand, using graphical methods; with
a larger number of $V$'s, a computer subroutine for calculating
the chi-square distribution would be necessary. Notice that
{\sl all 20 of the observations in Fig.\ 4(c) fall between the
5 and 95 percent levels}, so we would not have regarded {\sl
any} of them as suspicious, individually; yet collectively the
empirical distribution shows that these observations are not at
all right.
An important difference between the KS test and the chi-square
test is that the KS test applies to distributions $F(x)$ having
no jumps, while the chi-square test applies to distributions
having nothing but jumps (since all observations are divided
into $k$ categories). The two tests are thus intended for different
sorts of applications. Yet it is possible to apply the $\chi
↑2$ test even when $F(x)$ is continuous, if we divide the domain
of $F(x)$ into $k$ parts and ignore all variations within each
part. For example, if we want to test whether or not $U↓1$, $U↓2$,
$\ldotss$, $U↓n$ can be considered to come from the uniform distribution
between zero and one, we want to test if they have the distribution
$F(x) = x$ for $0 ≤ x ≤ 1$. This is a natural application for
the KS test. But we might also divide up the interval from 0
to 1 into $k = 100$ equal parts, count how many $U$'s fall into
each part, and apply the chi-square test with 99 degrees of
freedom. There are not many theoretical results available at
the present time to compare the effectiveness of the KS test
versus the chi-square test. The author has found some examples
in which the KS test pointed out nonrandomness more clearly
than the $\chi ↑2$ test, and others in which the $\chi ↑2$ test
gave a more significant result. If, for example, the 100 categories
mentioned above are numbered 0, 1, $\ldotss$, 99, and if the deviations
from the expected values are positive in compartments 0 to 49
but negative in compartments 50 to 99, then the empirical distribution
function will be much further from $F(x)$ than the $\chi ↑2$
value would indicate; but if the positive deviations occur in
compartments 0, 2, $\ldotss$, 98 and the negative ones occur in
1, 3, $\ldotss$, 99, the empirical distribution function will
tend to hug $F(x)$ much more closely. The kinds of deviations
measured are therefore somewhat different. A $\chi ↑2$ test
was applied to the 200 observations which led to Fig.\ 4, with
$k = 10$, and the respective values of $V$ were 9.4, 17.7, and
39.3; so in this particular case the values are quite comparable
to the KS values given in (16). Since the $\chi ↑2$ test is
intrinsically less accurate, and since it requires comparatively
large values of $n$, the KS test has several advantages when
a continuous distribution is to be tested.
\topinsert{\vskip 73mm
\ctrline{\caption Fig. 5. The KS tests applied to the same data as Fig. 2.}}
A further example will also be
of interest. The data which led to Fig.\ 2 were chi-square statistics
based on $n = 200$ observations of the ``maximum of $t\,$'' criterion
for $1 ≤ t ≤ 5$, with the range divided into 10 equally probable
parts. KS statistics $K↑{+}↓{200}$ and $K↑{-}↓{200}$ can be
computed from exactly the same sets of 200 observations, and
the results can be tabulated in just the same way as we did
in Fig.\ 2 (showing which KS values are beyond the 99-percent
level, etc.); the results in this case are shown in Fig.\ 5. Note
that Generator D (Lehmer's original method) shows up very badly in Fig.\
5, while chi-square tests {\sl on the very same data} revealed
no difficulty in Fig.\ 2; contrariwise, Generator E (the Fibonacci
method) does not look so bad in Fig.\ 5. The good generators,
A and B, passed all tests satisfactorily. The reasons for the
discrepancies between Fig.\ 2 and Fig.\ 5 are primarily that (a)
the number of observations, 200, is really not large enough
for a powerful test, and (b) the ``reject,'' ``suspect,'' ``almost
suspect'' ranking criterion is itself suspect.
(Incidentally, it is not fair to blame Lehmer for using a ``bad'' random-number
generator in the 1940s, since his actual use of Generator D was
quite valid. The ENIAC computer was a highly parallel machine,
programmed by means of a plugboard; Lehmer set it up so that
one of its accumulators was repeatedly multiplying its own contents
by 23, mod $(10↑8 + 1)$, yielding a new value every few microseconds.
Since this multiplier 23 is too small, we know that each value
obtained by such a process was too strongly related to the preceding
value to be considered sufficiently random; but the durations
of time between actual {\sl uses} of the values in the special accumulator
by the accompanying program were comparatively long and subject to some
fluctuation. So the effective multiplier was $23↑k$ for large,
{\sl varying} values of $k$.)
%folio 68 galley 4 (C) Addison-Wesley 1978 *
\subsectionbegin{C. History, bibliography, and theory} The chi-square
test was introduced by Karl Pearson in 1900 [{\sl Philosophical
Magazine}, Series 5, {\bf 50}, 157--175]. Pearson's important
paper is regarded as one of the foundations of modern statistics,
since before that time people would simply plot experimental
results graphically and assert that they were correct. In his
paper, Pearson gave several interesting examples of the previous
misuse of statistics; and he also proved that certain runs at
roulette (which he had experienced during two weeks at Monte
Carlo in 1892) were so far from the expected frequencies that
odds against the assumption of an honest wheel were some $10↑{29}$
to one! A general discussion of the chi-square test and an extensive
bibliography appear in the survey article by William G. Cochran,
{\sl Annals Math.\ Stat.\ \bf 23} (1952), 315--345.
Let us now consider a brief derivation of the theory behind the
chi-square test. The exact probability that $Y↓1 = y↓1, \ldotss
, Y↓k = y↓k$ is easily seen to be
$${n!\over y↓1! \ldotsm y↓k!} p↑{y↓1}↓{1} \ldotss p↑{y↓k}↓{k}.\eqno(17)$$
If we assume that $Y↓s$ has the value $y↓s$ with
the Poisson probability
$${e↑{-np↓s}(np↓s)↑{y↓s} \over y↓s!},$$
and that the $Y$'s are independent, then $(Y↓1,
\ldotss , Y↓k)$ will equal $(y↓1, \ldotss , y↓k)$ with probability
$$\prod ↓{1≤s≤k}{e↑{-np↓s}(np↓s)↑{y↓s} \over
y↓s!},$$
and $Y↓1 + \cdots + Y↓k$ will equal $n$ with probability
$$\sum ↓{\scriptstyle y↓1+\cdots+y↓k=n \atop \scriptstyle y↓1,\ldots,y↓k≥0}
\quad\prod↓{1≤s≤k}
{e↑{-np↓s}(np↓s)↑{y↓s}\over y↓s!}
={e↑{-n}n↑n\over n!}.$$
If we assume that they are independent {\sl except}
for the condition $Y↓1 + \cdots + Y↓k = n$, the probability
that $(Y↓1, \ldotss , Y↓k) = (Y↓1, \ldotss , y↓k)$ is the quotient
$$\bigglp\prod ↓{1≤s≤k}{e↑{-np↓s}(np↓s)↑{y↓s}\over y↓s!}\biggrp
\left/\left(e↑{-n}n↑n\over n!\right)\right.,$$
which equals (17). {\sl We may therefore regard
the $Y$'s as independently Poisson distributed, except for the
fact that they have a fixed sum.}
It is convenient to make a change of variables,
$$Z↓s = {Y↓s - np↓s\over\sqrt{np↓s}},\eqno(18)$$
so that $V = Z↑{2}↓{1} + \cdots + Z↑{2}↓{k}$.
The condition $Y↓1 + \cdots + Y↓k = n$ is equivalent
to requiring that
$$\sqrt{p↓1}Z↓1 + \cdots + \sqrt{p↓k}Z↓k = 0.\eqno (19)$$
Let us consider the $(k - 1)$-dimensional space
$S$ of all vectors $(Z↓1, \ldotss , Z↓k)$ such that (19) holds.
For large values of $n$, each $Z↓s$ has approximately the normal
distribution (cf.\ exercise 1.2.10--16); therefore points in
a differential volume $dZ↓2 \ldotsm dZ↓k$ of $S$ occur with
probability {\sl approximately} proportional to $\exp\left(-(Z↑{2}↓{1}
+ \cdots + Z↑{2}↓{k})/2\right)$. (It is at this point in the derivation
that the chi-square method becomes only an approximation for
large $n$.) The probability that $V ≤ v$ is now
$$\int ↓{(Z↓1,\ldotss,Z↓k)\,\char'151\char'156\,S\,\char'141\char'156\char'144\,
Z↓1+\cdots+Z↓k≤v}\,
\exp\left(-(Z↑{2}↓{1} + \cdots + Z↑{2}↓{k})/2\right)\, dZ↓2 \ldotsm
dZ↓k \over
\int ↓{(Z↓1,\ldotss,Z↓k)\,\char'151\char'156\,S}\,
\exp\left(-(Z↑{2}↓{1} + \cdots + Z↑{2}↓{k})/2\right)\, dZ↓2 \ldotsm
dZ↓k \eqno(20)$$
Since the hyperplane (19) passes through the origin
of $k$-dimensional space, the numerator in (20) is an integration
over the interior of a $(k - 1)$-dimensional hypersphere centered
at the origin. An appropriate transformation to generalized
polar coordinates with radius $\chi$ and angles $\omega ↓1$,
$\ldotss$, $\omega ↓{k-2}$ transforms (20) into
$$\int ↓{\chi ↑2≤v}\,e↑{-\chi↑2/2}\chi ↑{k-2}f(\omega
↓1, \ldotss , \omega ↓{k-2}) \,d\chi\, d\omega ↓1 \ldotsm d\omega
↓{k-2} \over
\int \,e↑{-\chi↑2/2}\chi ↑{k-2}f(\omega
↓1, \ldotss , \omega ↓{k-2}) \,d\chi\, d\omega ↓1 \ldotsm d\omega
↓{k-2}$$
for some function $f$ (see exercise 15); then integration
over the angles $\omega ↓1$, $\ldotss$, $\omega ↓{k-2}$ gives a
constant factor which cancels from numerator and denominator.
We finally obtain the formula
$$\int ↑{\sqrt v}↓0 e↑{-\chi↑2/2}\chi ↑{k-2}\, d\chi
\over
\int ↑∞↓0 e↑{-\chi↑2/2}\chi ↑{k-2}\, d\chi
\eqno(21)$$
for the approximate probability that $V ≤ v$.
The above derivation uses the symbol $\chi$ to stand for the
radial length, just as Pearson did in his original paper; this
is how the $\chi ↑2$ test got its name. Substituting $t = \chi
↑2/2$, the integrals can be expressed in terms of the incomplete
gamma function, which we discussed in Section 1.2.11.3:
$$\lim↓{n→∞}\hjust{probability that }(V ≤ v) = \gamma
\left({k-1\over2},{v\over 2}\right)\left/\Gamma \left(k-1\over2\right)\right.
.\eqno (22)$$
This is the definition of the chi-square distribution
with $k - 1$ degrees of freedom.
\yskip We now turn to the KS test. In 1933, A. N. Kolmogorov proposed
a test based on the statistic
$$K↓n = \sqrt n \max↓{
-∞<x<+∞}\left|F↓n(x) - F(x)\right| = \max(K↑{+}↓{n}, K↑{-}↓{n}).\eqno
(23)$$
N. V. Smirnov gave several modifications of this
test in 1939, including the individual examination of $K↑{+}↓{n}$
and $K↑{-}↓{n}$ as we have suggested above. There is a large
family of similar tests, but the $K↑{+}↓{n}$ and $K↑{-}↓{n}$
statistics seem to be most convenient for computer application. A
comprehensive review of the literature concerning KS tests and
their generalizations, including an extensive bibliography,
appears in a monograph by J. Durbin, {\sl Regional Conf.\ Series on Applied
Math.\ \bf 9} (SIAM, 1973).
To study the distribution of $K↑{+}↓{n}$ and $K↑{-}↓{n}$, we
begin with the following basic fact: {\sl If $X$ is a random variable
with the continuous distribution $F(x)$, then $F(X)$ is a uniformly
distributed real number between $0$ and $1$.} To prove this,
we need only verify that if $0 ≤ y ≤ 1$ we have $F(X) ≤ y$ with
probability $y$. Since $F$ is continuous, $F(x↓0) = y$ for some
$x↓0$; thus the probability that $F(X) ≤ y$ is the probability
that $X ≤ x↓0$. By definition, the latter probability is $F(x↓0)$,
that is, it is $y$.
Let $Y↓j = nF(X↓j)$, for $1 ≤ j ≤ n$, where the $X$'s have been
sorted as in Step 2 above. Then the variables $Y↓j$
are essentially the same as independent, uniformly distributed random
numbers between 0 and 1 which have been sorted into nondecreasing order,
$Y↓1 ≤ Y↓2 ≤ \cdots ≤ Y↓n$; and the first equation of (13)
may be transformed into
$$K↑{+}↓{n} = {1\over \sqrt{n}}
\max(1 - Y↓1, 2 - Y↓2, \ldotss , n - Y↓n).$$
If $0 ≤ t ≤ n$, the probability that $K↑{+}↓{n}
≤ t\,/\sqrt{n}$ is therefore the probability that $Y↓j ≥ j - t$
for $1 ≤ j ≤ n$. This is not hard to express in terms of $n$-dimensional
integrals,
$${\int↓{α↓n}↑n\,dY↓n\int↓{α↓{n-1}}↑{Y↓n}\,dY↓{n-1}\;\ldots\int↓{α↓1}↑{Y↓2}\,dY↓1
\over
\int↓0↑n\,dY↓n\int↓0↑{Y↓n}\,dY↓{n-1}\;\ldots\int↓0↑{Y↓2}\,dY↓1},
\qquad\hjust{where}\qquad α↓j=\max(j-t,0).\eqno(24)$$
The denominator here is immediately evaluated:
it is found to be $n↑n/n!$, which makes sense since the hypercube
of all vectors $(Y↓1, Y↓2, \ldotss , Y↓n)$ with $0 ≤ Y↓j < n$
has volume $n↑n$, and it can be divided into $n!$ equal parts
corresponding to each possible ordering of the $Y$'s. The integral
in the numerator is a little more difficult, but it yields to
the attack suggested in exercise 17, and we get the general
formula
$$\hjust{probability that }\left(K↑{+}↓{n} ≤ {t\over\sqrt n}\right) =
{t\over n↑n}\sum↓{0≤k≤t}{n \choose k}(k - t)↑k(t + n - k)↑{n-k-1}.\eqno
(25)$$
The distribution of $K↑{-}↓{n}$ is exactly the
same. Equation (25) was first obtained by Z. W. Birnbaum and
Fred H. Tingey [{\sl Annals Math.\ Stat.\ \bf 22} (1951), 592--596];
it may be used to extend Table 2.
In his original paper, Smirnov proved that
$$\lim↓{n→∞}\hjust{probability that }\left(K↑{+}↓{n} ≤ s\right) = 1 - e↑{-2s↑2},
\qquad\hjust{if}\qquad s ≥ 0.\eqno (26)$$
This together with (25) implies that, for all
$s ≥ 0$, we have
$$\def\\{\sqrt n}\lim↓{n→∞} {s\over\\}\sum↓{\\s<k≤n}{n\choose k}
\left({k\over n}-{s\over\\}\right)↑k
\left({s\over\\}+1-{k\over n}\right)↑{n-k-1}=e↑{-2s↑2}.\eqno(27)$$
The more precise asymptotic formulas in Table 2
follow from results obtained by D. A. Darling [{\sl Theory of
Prob.\ and Appl.\ \bf 5} (1960), 356-361], who proved among
other things that $K↑+↓n≤s$ with probability
$$1 - e↑{-2s↑2}\left(1- {2\over 3}s/\sqrt{n} + O(1/n)\right)\eqno (28)$$
for all fixed $s ≥ 0$.
%folio 72 galley 5 (C) Addison-Wesley 1978 *
\exbegin{EXERCISES}
\exno 1. [00] What line
of the chi-square table should be used to check whether or not
the value $V = 7{7\over 48}$ of Eq.\ (5) is improbably high?
\exno 2. [20] If two dice are ``loaded'' so that, on one die,
the value 1 will turn up exactly twice as often as any of the
other values, and the other die is similarly biased towards
6, compute the probability $p↓s$ that a total of exactly $s$ will
appear on the two dice, for $2 ≤ s ≤ 12$.
\trexno 3. [23] Some dice which were loaded as described in the
previous exercise were rolled 144 times, and the following values
were observed:
$$\vcenter{
\tabskip 8pt\halign{\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt
{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}⊗\rt{#}\cr
value of $s =$⊗2⊗3⊗4⊗5⊗6⊗7⊗8⊗9⊗10⊗11⊗12\cr
\noalign{\vskip 3pt}
observed number, $Y↓s =$⊗2⊗6⊗10⊗16⊗18⊗32⊗20⊗13⊗16⊗9⊗2\cr}}$$
Apply the chi-square test to {\sl these} values, using
the probabilities in (1), pretending it is not known that the
dice are in fact faulty. Does the chi-square test detect the
bad dice? If not, explain why not.
\trexno 4. [23] The author actually obtained the data in experiment
1 of (9) by simulating dice in which one was normal, the other
was loaded so that it always turned up 1 or 6. (The latter two
possibilities were equally probable.) Compute the probabilities
which replace (1) in this case, and by using a chi-square test
decide if the results of that experiment are consistent with
the dice being loaded in this way.
\exno 5. [22] Let $F(x)$ be the uniform distribution, Fig.\ 3(b).
Find $K↑{+}↓{20}$ and $K↑{-}↓{20}$ for the following 20 observations:
$$\eqalign{0.414, 0.732, 0.236, 0.162, 0.259, 0.442, 0.189, 0.693, 0.098,
0.302,\cr
\noalign{\vskip 3pt}
0.442, 0.434, 0.141, 0.017, 0.318, 0.869, 0.772, 0.678, 0.354, 0.718,\cr}$$
and state whether these observations are significantly
different from expected behavior with respect to either of these
two tests.
\exno 6. [M20] Consider $F↓n(x)$, as given in Eq.\ (10), for
fixed $x$. What is the probability that $F↓n(x) = s/n$, given
an integer $s?$ What is the mean value of $F↓n(x)?$ What is
the standard deviation?
\exno 7. [M15] Show that $K↑{+}↓{n}$
and $K↑{-}↓{n}$ can never be negative. What is the largest possible
value $K↑{+}↓{n}$ can be?
\exno 8. [00] The text describes an experiment in which 20 values
of the statistic $K↑{+}↓{10}$ were obtained in the study of
a random sequence. These values were plotted, to obtain Fig.\
4, and a KS statistic was computed from the resulting graph.
Why were the table entries for $n = 20$ used to study the resulting
statistic, instead of the table entries for $n = 10?$
\trexno 9. [20] The experiment described in the text consisted
of plotting 20 values of $K↑{+}↓{10}$, computed from the ``maximum
of 5'' test applied to different parts of a random sequence.
We could have computed also the corresponding 20 values of $K↑{-}↓{10}$;
since $K↑{-}↓{10}$ has the same distribution as $K↑{+}↓{10}$,
we could lump together the 40 values thus obtained (that is,
20 of the $K↑{+}↓{10}$'s and 20 of the $K↑{-}↓{10}$'s), and
a KS test could be applied so that we would get new values $K↑{+}↓{40},
K↑{-}↓{40}$. Discuss the merits of this idea.
\trexno 10. [20] Suppose a chi-square test is done by making $n$
observations, and the value $V$ is obtained. Now we repeat the
test on these same $n$ observations over again (getting, of
course, the same results), and we put together the data from
both tests, regarding it as a single chi-square test with $2n$
observations. (This procedure violates the text's stipulation
that all of the observations must be independent of one another.)
How is the second value of $V$ related to the first one?
\exno 11. [10] Solve exercise 10 substituting the KS test for
the chi-square test.
\exno 12. [M28] Suppose a chi-square test is made on a set of
$n$ observations, assuming that $p↓s$ is the probability that
each observation falls into category $s$; but suppose that in
actual fact the observations have probability $q↓s ≠ p↓s$ of
falling into category $s$. (Cf.\ exercise 3.) We would, of course,
like the chi-square test to detect the fact that the $p↓s$ assumption
was incorrect. Show that this {\sl will} happen, if $n$ is large
enough. Prove also the analogous result for the KS test.
\exno 13. [M24] Prove that Eqs.\ (13) are equivalent to Eqs.\
(11).
\trexno 14. [HM26] Let $Z↓s$ be given by Eq.\ (18). Show directly
by using Stirling's approximation that the multinomial probability
$$n!p↑{Y↓1}↓1\ldotss p↑{Y↓k}↓{k}/Y↓1! \ldotsm Y↓k! = e↑{-V/2}/\sqrt{(2nπ)↑{k-1}p↓1
\ldotsm p↓k} + O(n↑{-k/2}),$$
if $Z↓1, Z↓2, \ldotss , Z↓k$ are bounded as $n →
∞$. (This idea leads to a proof of the chi-square test which
is much closer to ``first principles,'' and requires less handwaving,
than the derivation in the text.)
\exno 15. [HM24] Polar coordinates in two dimensions are conventionally
taken to be $x = r \cos \theta$ and $y = r \sin \theta $.
For the purposes of integration, we have $dx \,dy = r\, dr\, d\theta
$. More generally, in $n$-dimensional space we can let
$$x↓k = r \sin \theta ↓1 \ldotsm \sin \theta ↓{k-1}
\cos \theta ↓k,\qquad 1 ≤ k < n,\qquad x↓n = r \sin \theta
↓1 \ldotsm \sin \theta ↓{n-1}.$$
Show that in this case
$$dx↓1\,dx↓2 \ldotsm dx↓n = \left|r↑{n-1} \sin↑{n-2}\theta
↓1 \ldotsm \sin \theta ↓{n-2}\, dr\, d\theta ↓1 \ldotsm d\theta ↓{n-1}\right|.$$
\trexno 16. [HM35] Generalize Theorem
1.2.11.3A to find the value of
$$\gamma (x + 1, x + z\sqrt{2x} + y)/\Gamma (x
+ 1),$$
for large $x$ and fixed $y, z$. Disregard terms
of the answer which are $O(1/x)$. Use this result to find the
approximate solution, $t$, to the equation
$$\gamma \left({\nu \over 2} , {t\over 2}\right)\left/\Gamma
\left(\nu \over 2\right)\right. = p,$$
for large $\nu$ and fixed $p$, thereby accounting
for the asymptotic formulas indicated in Table 1.\xskip [{\sl Hint:}
See exercise 1.2.11.3--8.]
\exno 17. [HM26] Let $t$ be a fixed real number. For $0 ≤ k
≤ n$, let
$$P↓{nk}(x) = \int↑{x}↓{n-t}\,dx↓n\int↑{x↓n}↓{n-1-t}\,dx↓{n-1}\;\ldots
\int↑{x↓{k+2}}↓{k+1-t}\,dx↓{k+1}\int↑{x↓{k+1}}↓{0}\,dx↓k\;\ldots\int↑{x↓2}↓{0}dx↓1
;$$
by convention, let $P↓{00}(x) = 1$. Prove the following
relations:
\vskip 11pt plus 3pt minus 8pt
\textindent{a)}$\dispstyle P↓{nk}(x) =\int↑{x+t}↓{n}\,dx↓n\int↑{x↓n}↓{n-1}\,dx↓{n-1}
\;\ldots\int↑{x↓{k+2}}↓{k+1}\,dx↓{k+1}\int↑{x↓{k+1}}↓{t}dx↓k\; \ldots
\int↑{x↓2}↓{t}\,dx↓1$.
\yskip
\textindent{b)}$P↓{n0}(x) = (x + t)↑n/n! - (x + t)↑{n-1}/(n - 1)!$.
\yskip
\textindent{c)}$\dispstyle P↓{nk}(x) - P↓{n(k-1)}(x) = {(k - t)↑k\over
k!} P↓{(n-k)0}(x - k)$, if $1 ≤ k ≤ n$.
\yskip
\textindent{d)}Obtain a general formula for $P↓{nk}(x)$,
and apply it to the evaluation of Eq.\ (24).
\exno 18. [M20] Give a ``simple'' reason why $K↑{-}↓{n}$ has
the same probability distribution as $K↑{+}↓{n}$.
\exno 19. [HM48] Develop tests, analogous to the Kolomogrov--Smirnov
test, for use with multivariate distributions $F(x↓1, \ldotss
, x↓r) =\hjust{probability that }(X↓1 ≤ x↓1, \ldotss , X↓r ≤ x↓r)$.
(Such procedures could be used, for example, in place of the
``serial test'' in the next section.)
\exno 20. [HM41] Deduce further terms of the asymptotic behavior
of the KS distribution, extending (28).
\exno 21. [M40] Although the text states that the KS test should
be applied only when $F(x)$ is a continuous distribution function,
it is, of course, possible to try to compute $K↑{+}↓{n}$ and
$K↑{-}↓{n}$ even when the distribution has jumps. Analyze the probable
behavior of $K↑{+}↓{n}$ and $K↑{-}↓{n}$ for various discontinuous
distributions $F(x)$. Compare the effectiveness of the resulting
statistical test with the chi-square test on several samples
of random numbers.
\exno 23. [M22] (T. Gonzalez, S. Sahni, and W. R. Franta.)\xskip (a)
Suppose that the maximum value in formula (13) for the KS statistic
$K↑{+}↓{n}$ occurs at a given index $j$ where $\lfloor nF(X↓j)\rfloor
= k$. Prove that $F(X↓j) = \max↓{1≤i≤n}\leftset F(X↓i)\relv \lfloor
nF(X↓i)\rfloor = k\rightset$.\xskip (b) Design an algorithm which calculates
$K↑{+}↓{n}$ and $K↑{-}↓{n}$ in $O(n)$ steps (without sorting).
\exno 24. [HM46] Investigate the ``improved'' KS test suggested
in the answer to exercise 6.
%folio 75 galley 6 (C) Addison-Wesley 1978 *
\runningrighthead{EMPIRICAL TESTS} section{3.3.2}
\sectionskip
\sectionbegin{3.3.2. Empirical Tests}
In this section we shall discuss ten kinds of
specific tests that have been applied to sequences in order
to investigate their randomness. The discussion of each test
has two parts: (a) a ``plug-in'' description of how to perform
the test; and (b) a study of the theoretical basis for the test.
(Readers lacking mathematical training may wish to skip over
the theoretical discussions. Conversely, mathematically-inclined
readers may find the associated theory quite interesting, even
if they never intend to test random-number generators, since
some instructive combinatorial questions are involved here.)
Each test is applied to a sequence
$$\langle U↓n\rangle = U↓0, U↓1, U↓2, \ldots\eqno (1)$$
of real numbers, which purports to be independently
and uniformly distributed between zero and one. Some of the
tests are designed primarily for integer-valued sequences, instead
of the real-valued sequence (1). In this case, the auxiliary
sequence
$$\langle Y↓n\rangle = Y↓0, Y↓1, Y↓2, \ldotss ,\eqno (2)$$
which is defined by the rule
$$Y↓n = \lfloor dU↓n\rfloor,\eqno (3)$$
is used instead. This is a sequence of integers
which purports to be independently and uniformly distributed
between 0 and $d - 1$. The number $d$ is chosen for convenience;
for example, we might have $d = 64 = 2↑6$ on a binary computer,
so that $Y↓n$ represents the six most significant bits of the
binary representation of $U↓n$. The value of $d$ should be large
enough so that the test is meaningful, but not so large that
the test becomes impracticably difficult to carry out.
The quantities $U↓n$, $Y↓n$, and $d$ will have the above significance
throughout this section, although the value of $d$ will probably be different
in different tests.
\subsectionbegin{A. Equidistribution test (Frequency test)}
The first requirement that sequence (1) must meet is that its
numbers are, in fact, uniformly distributed between zero and
one. There are two ways to make this test:\xskip (a) Use the Kolmogorov--\hskip 1pt
Smirnov
test, with $F(x) = x$ for $0 ≤ x ≤ 1$.\xskip (b) Let $d$ be a convenient
number, e.g., 100 on a decimal computer, 64 or 128 on a binary
computer, and use the sequence (2) instead of (1). For each
integer $r$, $0 ≤ r < d$, count the number of times that $Y↓j = r$
for $0 ≤ j < n$, and then apply the chi-square test using $k
= d$ and probability $p↓s = 1/d$ for each category.
The theory behind this test has been covered in Section 3.3.1.
\subsectionbegin{B. Serial test} More generally,
we want pairs of successive numbers to be uniformly distributed
in an independent manner. The sun comes up just about as often as it
goes down, in the long run, but this doesn't make its motion random.
To carry out the serial test, count the number
of times the pair $(Y↓{2j}, Y↓{2j+1}) = (q, r)$ occurs, for
$0 ≤ j < n$; these counts are to be made for each pair of integers
$(q, r)$ with $0 ≤ q, r < d$, and the chi-square test is applied
to these $k = d↑2$ categories with probability $1/d↑2$ in each
category. As with the equidistribution test, $d$ may be
any convenient number, but it will be somewhat smaller
than the values suggested above since a valid chi-square test
should have $n$ large compared to $k$ (say $n ≥ 5d↑2$ at least).
Clearly we can generalize this test to triples, quadruples,
etc., instead of pairs (see exercise 2); however, the value
of $d$ must then be severely reduced in order to avoid having
too many categories. When quadruples and larger numbers of adjacent
elements are considered, we therefore make use of less exact
tests such as the poker test or the maximum test described below.
Note that $2n$ numbers of the sequence (2) are used in this
test in order to make $n$ observations. It would be a mistake
to perform the serial test on the pairs $(Y↓0, Y↓1)$, $(Y↓1, Y↓2)$,
$\ldotss$, $(Y↓{n-1}, Y↓n)$; can the reader see why? We might perform
another serial test on the pairs $(Y↓{2j+1}, Y↓{2j+2})$, and
expect the sequence to pass both tests. Alternatively, I. J.
Good has proved that if $d$ is prime, and if the pairs $(Y↓0,
Y↓1)$, $(Y↓1, Y↓2)$, $\ldotss$, $(Y↓{n-1}, Y↓n)$ are used, and if we
use the usual chi-square method to compute both the statistics
$V↓2$ for the serial test and $V↓1$ for the frequency test
on $Y↓0, \ldotss , Y↓{n-1}$ with the same value of $d$, then
$V↓2 - 2V↓1$ should have the chi-square distribution
with $(d - 1)↑2$ degrees of freedom when $n$ is large.
(See {\sl Proc.\ Cambridge Phil.\ Soc.\
\bf 49} (1953), 276--284; {\sl Annals Math.\ Stat.\ \bf 28} (1957),
262--264.)
\topinsert{\quoteformat{
The equanimity of your average tosser of coins\cr
depends upon a law . . . which ensures that\cr
he will not upset himself by losing too much\cr
nor upset his opponent by winning too often.\cr}
author{TOM STOPPARD, {\sl Rosencrantz & Guildenstern are Dead} (1966)}}
\subsectionbegin{C. Gap test} Another test is used to
examine the length of ``gaps'' between occurrences of $U↓j$
in a certain range. If $α$ and $β$ are two real numbers with
$0 ≤ α < β ≤ 1$, we want to consider the lengths of consecutive
subsequences $U↓j$, $U↓{j+1}$, $\ldotss$, $U↓{j+r}$ in which $U↓{j+r}$
lies between $α$ and $β$ but the other $U$'s do not. (This subsequence
of $r + 1$ numbers represents a gap of length $r$.)
\algbegin Algorithm G (Data for gap test).
The following algorithm, applied to the sequence (1) for any given values
of $α$ and $β$, counts
the number of gaps of lengths 0, 1, $\ldotss$, $t - 1$ and the number
of gaps of length $≥t$, until $n$ gaps have been tabulated.
\algstep G1. [Initialize.] Set
$j ← -1$, $s ← 0$, and set {\tt COUNT}$[r] ← 0$ for $0 ≤ r ≤ t$.
\algstep G2. [Set $r$ zero.] Set $r ← 0$.
\algstep G3. [$α ≤ U↓j < β$?] Increase $j$ by 1. If $U↓j ≥ α$
and $U↓j < β$, go to step G5.
\algstep G4. [Increase $r$.] Increase $r$ by one, and return
to step G3.
\algstep G5. [Record gap length.] (A gap of length $r$ has now
been found.) If $r ≥ t$, increase {\tt COUNT}$[t]$ by one, otherwise
increase {\tt COUNT}$[r]$ by one.
\algstep G6. [$n$ gaps found?] Increase $s$ by one. If $s <
n$, return to step G2.\quad\blackslug
\yyskip After this algorithm has been performed,
the chi-square test is applied to the $k = t + 1$ values of
{\tt COUNT}[0], {\tt COUNT}[1], $\ldotss$, {\tt COUNT}$[t]$, using the following
probabilities:
$$\halign{\hjust to size{#}\cr
$\quad p↓0 = p,\qquad p↓1 = p(1 - p),\qquad p↓2 = p(1 - p)↑2,\qquad
\ldotss $,\hfill\cr
\hfill$p↓{t-1} = p(1 - p)↑{t-1},\qquad p↓t = (1 - p)↑t.\qquad (4)$\cr}$$
Here $p = β - α$, the probability that $α ≤ U↓j
< β$. The values of $n$ and $t$ are to be chosen, as usual,
so that each of the values of {\tt COUNT}$[r]$ is expected to be 5
or more, preferably more.
The gap test is often applied with $α = 0$ or $β = 1$ in order
to omit one of the comparisons in step G3. The special cases $(α, β)
= (0, {1\over 2})$ or $({1\over 2}, 1)$ give rise to tests which
are sometimes called ``runs above the mean'' and ``runs below
the mean,'' respectively.
The probabilities in Eq.\ (4) are easily deduced, so this derivation
is left to the reader. Note that the gap test as described above
observes the lengths of $n$ {\sl gaps}\/; it does not observe
the gap lengths among $n$ {\sl numbers.} If the sequence $\langle
U↓n\rangle$ is sufficiently nonrandom, Algorithm G may not
terminate. Other gap tests which examine a fixed number
of $U$'s have also been proposed (see exercise 5).
\topinsert{\vskip 41mm
\hjust to size{\caption Fig. 6. Gathering data
for the gap test. (Algorithms for the ``coupon-collector's test''
and the ``run test'' are similar.)}}
\subsectionbegin{D. Poker test (Partition test)}
The ``classical'' poker test considers $n$ groups of five successive
integers, $(Y↓{5j}, Y↓{5j+1}, \ldotss , Y↓{5j+4})$ for $0 ≤ j < n$,
and observes which of the following seven patterns is matched by each
quintuple:
$$\tabskip 0pt plus 1000pt\halign to size{\rt{#}\tabskip 0pt⊗\!
\qquad\lft{$ #$}\qquad\qquad⊗\rt{#}⊗\qquad\lft{$ #$}\tabskip 0pt plus 1000pt\cr
All different:⊗abcde⊗Full house:⊗aaabb\cr
\noalign{\penalty1000\vskip 4pt plus 1pt minus 2pt}
One pair:⊗aabcd⊗Four of a kind:⊗aaaab\cr
\noalign{\penalty1000\vskip 4pt plus 1pt minus 2pt}
Two Pairs:⊗aabbc⊗Five of a kind:⊗aaaaa\cr
\noalign{\penalty1000\vskip 4pt plus 1pt minus 2pt}
Three of a kind:⊗aaabc\cr}$$
A chi-square test is based on the number of quintuples
in each category.
It is reasonable to ask for a somewhat simpler version of this
test, to facilitate the programming involved. A good compromise
would simply be to count the number of {\sl distinct} values
in the set of five. We would then have five categories:
$$\eqalign{\hjust{5 different}⊗ = \hjust{all different;}\cr
\hjust{4 different}⊗ = \hjust{one pair;}\cr
\hjust{3 different}⊗ = \hjust{two pairs, or three of a kind;}\cr
\hjust{2 different}⊗ = \hjust{full house, or four of s kind;}\cr
\hjust{1 different}⊗ = \hjust{five of a kind.}\cr}$$
This breakdown is easier to determine systematically,
and the test is nearly as good.
In general we can consider $n$ groups of $k$ successive numbers,
and we can count the number of $k$-tuples with $r$ different
values. A chi-square test is then made, using the probability
$$p↓r = {d(d - 1) \ldotsm (d - r + 1)\over d↑k} {k \comb\{\} r}
\eqno (5)$$
that there are $r$ different. (The Stirling numbers
$k\comb\{\}r$ are defined in Section 1.2.6, and they
can readily be computed using the formulas given there.) Since
the probability $p↓r$ is very small when $r = 1$ or 2, we generally
lump a few categories of low probability together before the
chi-square test is applied.
To derive the proper formula for $p↓r$, we must count how many
of the $d↑k$ $k$-tuples of numbers between 0 and $d - 1$ have
exactly $r$ different elements, and divide the total by $d↑k$.
Since $d(d - 1) \ldotsm (d - r + 1)$ is the number of ordered
choices of $r$ things from a set of $d$ objects, we need only
show that $k\comb\{\}r$ is the number of ways to partition
a set of $k$ elements into exactly $r$ parts. Therefore exercise
1.2.6--64 completes the derivation of Eq.\ (5).
%folio 79 galley 7 (C) Addison-Wesley 1978 *
\subsectionbegin{E. Coupon collector's test} This test is related
to the poker test somewhat as the gap test is related to the
frequency test. The sequence $Y↓0$, $Y↓1$, $\ldots$ is used, and
we observe the lengths of segments $Y↓{j+1}$, $Y↓{j+2}$, $\ldotss$,
$Y↓{j+r}$ required to get a ``complete set'' of integers from
0 to $d - 1$. Algorithm C describes this precisely:
\algbegin Algorithm C (Data for coupon collector's test).
Given a sequence of integers $Y↓0$, $Y↓1$, $\ldotss $, with $0 ≤
Y↓j < d$, this algorithm counts the lengths of $n$ consecutive
``coupon collector'' segments. At the conclusion of the algorithm,
{\tt COUNT}$[r]$ contains the number of segments with length $r$,
for $d ≤ r < t$, and {\tt COUNT}$[t]$ contains the number with length
$≥t$.
\algstep C1. [Initialize.] Set $j ← -1$, $s
← 0$, and set {\tt COUNT}$[r] ← 0$ for $d ≤ r ≤ t$.
\algstep C2. [Set $q, r$ zero.] Set $q ← r ← 0$, and set {\tt OCCURS}$[k]
← 0$ for $0 ≤ k < d$.
\algstep C3. [Next observation.] Increase $r$ and $j$ by 1.
If {\tt OCCURS}$[Y↓j] ≠ 0$, repeat this step.
\algstep C4. [Complete set?] Set {\tt OCCURS}$[Y↓j] ← 1$ and $q ← q + 1$.
(The subsequence observed so far contains $q$ distinct values;
if $q = d$, we therefore have a complete set.) If $q < d$, return
to step C3.
\algstep C5. [Record the length.] If $r ≥ t$, increase {\tt COUNT}$[t]$
by one, otherwise increase {\tt COUNT}$[r]$ by one.
\algstep C6. [$n$ found?] Increase $s$ by one. If $s < n$, return
to step C2.\quad\blackslug
\yyskip For an example of this algorithm,
see exercise 7. We may think of a boy collecting $d$ types of
coupons, which are randomly distributed in his breakfast cereal
boxes; he must keep eating more cereal until he has one coupon
of each type.
A chi-square test is to be applied to {\tt COUNT}$[d]$, {\tt COUNT}$[d +
1]$, $\ldotss$, {\tt COUNT}$[t]$, with $k = t - d + 1$, after Algorithm C
has counted $n$ lengths. The corresponding probabilities are
$$p↓r = {d!\over d↑r}{r-1\comb\{\}d-1},\qquad
d ≤ r < t;\qquad p↓t = 1 - {d!\over d↑{t-1}}{t-1\comb\{\}d}.\eqno(6)$$
To derive these probabilities, we simply note that
if $q↓r$ denotes the probability that a subsequence of length
$r$ is {\sl incomplete}, then
$$q↓r = 1 - {d!\over d↑r}{r\comb\{\}d}$$
by Eq.\ (5); for this means we have an $r$-tuple
of elements which do not have all $d$ different values. Then
(6) follows from the relations $p↓r = q↓{r-1} - q↓r$ for $d
≤ r < t$;\xskip$p↓t = q↓{t-1}$.
For formulas which arise in connection with {\sl generalizations}
of the coupon collector's test, see exercises 9 and 10 and also
the paper by Hermann von Schelling, {\sl AMM \bf 61} (1954), 306--311.
\subsectionbegin{F. Permutation test} Divide the
input sequence into $n$ groups of $t$ elements each, that is,
$(U↓{jt}, U↓{jt+1}, \ldotss , U↓{jt+t-1})$ for $0 ≤ j < n$. The elements in
each group can have $t!$ possible relative orderings; the number
of times each ordering appears is counted, and a chi-square
test is applied with $k = t!$ and with probability $1/t!$ for
each ordering.
For example, if $t = 3$ we would have six categories, according
to whether $U↓{3j} < U↓{3j+1} < U↓{3j+2}$ or $U↓{3j} < U↓{3j+2}
< U↓{3j+1}$ or $\cdots$ or $U↓{3j+2} < U↓{3j+1} < U↓{3j}$. We
assume in this test that equality between $U$'s does not occur;
such an assumption is justified, for the probability that two
$U$'s are equal is zero.
A convenient way to perform the permutation test on a computer makes use of
the following algorithm, which is of interest in itself:
\algbegin Algorithm P (Analyze a permutation). Given
a set of distinct elements $(U↓1, \ldotss , U↓t)$, we compute
an integer $f(U↓1, \ldotss , U↓t)$ such that
$$0 ≤ f(U↓1, \ldotss , U↓t) < t!,$$
and $f(U↓1, \ldotss , U↓t) = f(V↓1, \ldotss , V↓t)$
if and only if $(U↓1, \ldotss , U↓t)$ and $(V↓1, \ldotss , V↓t)$
have the same relative ordering.
\algstep P1. [Initialize.] Set $r ← t$, $f
← 0$. (During this algorithm we will have $0 ≤ f < t!/r!$.)
\algstep P2. [Find maximum.] Find the maximum of $\{U↓1, \ldotss
, U↓r\}$, and suppose that $U↓s$ is the maximum. Set $f
← r \cdot f + s - 1$.
\algstep P3. [Exchange.] Exchange $U↓r ↔ U↓s$.
\algstep P4. [Decrease $r$.] Decrease $r$ by
one. If $r > 1$, return to step P2.\quad\blackslug
\yyskip Note that the sequence ($U↓1, \ldotss , U↓t)$
will have been sorted into ascending order when this algorithm
stops. To prove that the result $f$ uniquely characterizes the
{\sl initial} order of $(U↓1, \ldots , U↓t)$, we note that Algorithm
P can be run backwards: For $r = 2$, 3, $\ldotss$, $t$, set $s ←
f \mod r$, $f ← \lfloor f/r\rfloor$, and exchange $U↓r ↔ U↓s$. It is
easy to see that this will undo the effects of steps P2--P4;
hence no two permutations can yield the same value of $f$, and
Algorithm P performs as advertised.
The essential idea which underlies Algorithm P is a mixed-radix
representation called the ``factorial number system'': Every
integer in the range $0 ≤ f < t!$ can be uniquely written
in the form
$$\eqalignno{f⊗=\biglp \ldots (c↓{t-1} \times (t - 1) + c↓{t-2})
\times (t - 2) + \cdots + c↓2\bigrp \times 2 + c↓1\cr
\noalign{\vskip 3pt} ⊗=
(t - 1)!c↓{t-1} + (t - 2)!c↓{t-2} + \cdots + 2!c↓2 + 1!c↓1⊗(7)\cr}$$
where the ``digits'' $c↓j$ are integers satisfying
$$0 ≤ c↓j ≤ j,\qquad\hjust{for } 1 ≤ j < t.$$
In Algorithm P, $c↓{r-1} = s - 1$ when step P2
is performed for a given value of $r$.
\subsectionbegin{G. Run test} A sequence may also be tested
for ``runs up'' and ``runs down.'' This means we examine the
length of {\sl monotone} subsequences of the original sequence,
i.e., segments which are increasing or decreasing.
As an example of the precise definition of a run, consider the
sequence of ten numbers ``1298536704''; putting a vertical line
at the left and right and between $X↓j$ and $X↓{j+1}$ whenever
$X↓j > X↓{j+1}$, we obtain
$$\def\0{\hjust to 7pt{}}
\def\\{\hjust to 7pt{\hfill\vrule height 9.944pt depth 3pt\hfill}}
\\1\02\09\\8\\5\\3\06\07\\0\04\\\eqno(9)$$
which displays the ``runs up'': there is a run
of length 3, followed by two runs of length 1, followed by another
run of length 3, followed by a run of length 2. The algorithm
of exercise 12 shows how to tabulate the length of ``runs up.''
Unlike the gap test and the coupon collector's test (which are
in many other respects similar to this test), {\sl we should not apply
a chi-square test to the above data}, since adjacent runs are
{\sl not} independent. A long run will tend to be followed by
a short run, and conversely. This lack of independence is enough
to invalidate a straightforward chi-square test. Instead, the
following statistic may be computed, when the run lengths have
been determined as in exercise 12:
$$V = {1\over n}\sum ↓{1≤i,j≤6}(\hjust{\tt COUNT}[i] - nb↓i)
(\hjust{\tt COUNT}[j] - nb↓j)a↓{ij},\eqno(10)$$
where $n$ is the length of the sequence, and the
coefficients $a↓{ij}$ and $b↓i$ are equal to
$$\eqalign{\left(\vcenter{\halign
{$a↓{#}\;$⊗$a↓{#}\;$⊗$a↓{#}\;$⊗$a↓{#}\;$⊗$a↓{#}\;$⊗$a↓{#}$\cr
11⊗12⊗13⊗14⊗15⊗16\cr
21⊗22⊗23⊗24⊗25⊗26\cr
31⊗32⊗33⊗34⊗35⊗36\cr
41⊗42⊗43⊗44⊗45⊗46\cr
51⊗52⊗53⊗54⊗55⊗56\cr
61⊗62⊗63⊗64⊗65⊗66\cr}}\right)⊗=\left(\vcenter{\halign
{\rt{#}⊗ \rt{#}⊗ \rt{#}⊗ \rt{#}⊗ \rt{#}⊗ \rt{#}\cr
4529.4⊗9044.9⊗13568⊗18091⊗22615⊗27892\cr
9044.9⊗18097⊗27139⊗36187⊗45234⊗55789\cr
13568⊗27139⊗40721⊗54281⊗67852⊗83685\cr
18091⊗36187⊗54281⊗72414⊗90470⊗111580\cr
22615⊗45234⊗67852⊗90470⊗113262⊗139476\cr
27892⊗55789⊗83685⊗111580⊗139476⊗172860\cr}}\right)\cr
\noalign{\hjust to size{\hfill (11)}}
(b↓1\quad b↓2\quad b↓3\quad b↓4\quad b↓5\quad
b↓6)⊗= ({1\over 6}\quad {5\over 24}\quad {11\over 120}\quad
{19\over 720}\quad {29\over 5040}\quad {1\over 840}).\cr}$$
(The values of $a↓{ij}$ shown here are
approximate only; exact values may be
obtained by using formulas derived below.) {\sl The statistic
$V$ in (10) should have the chi-square distribution with six} (not
five) {\sl degrees of freedom}, when $n$ is large. The value
of $n$ should be, say, 4000 or more. The same test can be applied
to ``runs down.''
%Extra lines set by mistake on galley 1 *
\yskip A vastly simpler and more practical run test appears in exercise 14,
so a reader who is only interested in testing random-number generators
should skip the next few pages and go on to the ``maximum of $t$ test'' after
looking at exercise 14. On the other hand it is instructive from
a mathematical standpoint to see how a complicated run test with
interdependent runs can be treated, so we shall now digress for a moment.
Given any permutation on $n$ elements,
let $Z↓{pi} = 1$ if position $i$ is the beginning of an ascending
run of length $p$ or more, and let $Z↓{pi} = 0$ otherwise. For
example, consider the permutation (9) with $n = 10$; we have
$$Z↓{11} = Z↓{21} = Z↓{31} = Z↓{14} = Z↓{15} = Z↓{16} = Z↓{26}
= Z↓{36} = Z↓{19} = Z↓{29} = 1,$$
and all other $Z$'s are zero. With this notation,
$$R↑\prime↓{p} = Z↓{p1} + Z↓{p2} + \cdots + Z↓{pn}\eqno
(12)$$
is the number of runs of length $≥p$, and
$$R↓p = R↑\prime↓{p} - R↑\prime↓{p+1}\eqno (13)$$
is the number of runs of length $p$ exactly. Our
goal is to compute the mean value of $R↓p$, and also the {\sl
covariance}
$$\hjust{covar}(R↓p, R↓q) =\hjust{mean}\biglp\biglp R↓p -\hjust{mean}(R↓p)\bigrp
\biglp R↓q -\hjust{mean}(R↓q)\bigrp\bigrp,$$
which measures the interdependence of $R↓p$
and $R↓q$. These mean values are to be computed as the average
over the set of all $n!$ permutations.
Equations (12) and (13) show that the answers can be expressed
in terms of the mean values of $Z↓{pi}$ and of $Z↓{pi}Z↓{qj}$,
so as the first step of the derivation we obtain the following
results (assuming that $i < j$):
$$\save1\hjust{$\vcenter{\halign{\hjust to 35mm{$#$\hfill}⊗\lft{$#$}\cr
(p + \delta ↓{i1})/(p
+ 1)!,⊗\hjust{if }i ≤ n - p + 1;\cr
0,⊗\hjust{otherwise.}\cr}}$}
\eqalign{{1\over n!}\sum Z↓{pi} ⊗ = \left\{
\box1\right.\cr
\noalign{\vskip 4pt}
{1\over n!}\sum Z↓{pi}Z↓{pj}⊗=\left\{
\vcenter{\halign{$#$\hfill\cr
(p + \delta ↓{i1})q/(p + 1)! (q + 1)!,\cr
\hjust to 35mm{}\hjust{if }i + p < j ≤ n - q + 1;\cr
\noalign{\vskip 3pt}
(p + \delta ↓{i1})/(p + 1)! q! - (p + q + \delta ↓{i1})/(p + q + 1)!,\cr
\hjust to 35mm{}\hjust{if }i + p = j ≤ n - q + 1;\cr
\noalign{\vskip 3pt}
\hjust to 35mm{0,\hfill}\hjust{otherwise.}\cr}}\right.\cr}\eqno(14)$$
The $\sum$-signs stand for a summation over all possible permutations.
To illustrate the calculations involved here, we will work the
most difficult case, when $i + p = j ≤ n - q + 1$, and when
$i > 1$. Note that $Z↓{pi}Z↓{qj}$ is either zero or one, so
the summation consists of counting all permutations $U↓1 U↓2
\ldotsm U↓n$ for which $Z↓{pi} = Z↓{qj} = 1$, that is,
all permutations such that
$$U↓{i-1} > U↓i < \cdots < U↓{i+p-1} > U↓{i+p} < \cdots < U↓{i+p+q-1}.\eqno
(15)$$
The number of such permutations may be enumerated
as follows: there are $n\choose p+q+1$ ways to choose the
elements for the positions indicated in (15); there are
$$(p + q + 1){p+q\choose p} - {p+q+1\choose p+1}-{p+q+1\choose 1}+1\eqno(16)$$
ways to arrange them in the order (15), as shown
in exercise 13; and there are $(n - p - q - 1)!$ ways to arrange
the remaining elements. Thus there are ${n\choose p+q+1}
(n - p - q - 1)!$ times (16) ways in all, and dividing by $n!$ we
get the desired formula.
From relations (14) a rather lengthy calculation leads to
$$
\save1\hjust{$\vcenter{\halign{\lft{$#$}⊗\lft{$#$}\cr
\hjust{mean}(R↑\prime↓t)+f(p,q,n),⊗\quad\hjust{if }p+q≤n,\cr
\hjust{mean}(R↑\prime↓t)-\hjust{mean}(R↑\prime↓p)\hjust{mean}(R↑\prime↓q),⊗\quad
\hjust{if }p+q>n,\cr}}$}
\eqalignno{\hjust{mean}(R↑\prime↓{p})⊗ =\hjust{mean}(Z↓{p1} +
\cdots + Z↓{pn})\cr
⊗= (n + 1)p/(p + 1)! - (p - 1)/p!,\qquad 1 ≤ p ≤ n;⊗(17)\cr
\noalign{\vskip 6pt plus 3pt}
\hjust{covar}(R↑\prime↓{p}, R↑\prime↓{q}) ⊗=\hjust{mean}
(R↑\prime↓{p}R↑\prime↓{q}) -\hjust{mean}(R↑\prime↓{p})\hjust{mean}(R↑\prime↓{q})\cr
⊗= \sum ↓{1≤i,j≤n} {1\over n!} \sum Z↓{pi}Z↓{pj} -
\hjust{mean}(R↑\prime↓p)\hjust{mean}(R↑\prime↓{q})\cr
⊗=\left\{
\box1\right.⊗(18)\cr}$$
where $t = \max(p, q)$, $s = p + q$, and
$$\eqalign{f(p, q, n) ⊗= (n + 1)\left({s(1 - pq) + pq\over (p
+ 1)!(q + 1)!} - {2s\over (s + 1)!}\right) + 2\left(s - 1\over
s!\right)\cr
⊗\qquad\null + {(s↑2 - s - 2)pq - s↑2 - p↑2q↑2 + 1\over
(p + 1)! (q + 1)!} .\cr}\eqno (19)$$
This expression for the covariance is unfortunately
quite complicated, but it is necessary for a successful run
test as described above. From these formulas it is easy to compute
$$\eqalign{\hjust{mean}(R↓p)⊗=\hjust{mean}(R↑\prime↓{p}) -
\hjust{mean}(R↑\prime↓{p+1}),\cr
\noalign{\vskip 2pt plus 2pt}
\hjust{covar}(R↓p, R↑\prime↓{q})
⊗=\hjust{covar}(R↑\prime↓{p}, R↑\prime↓{q}) -\hjust{covar}(R↑\prime↓{p+1},
R↑\prime↓{q}),\cr
\noalign{\vskip 2pt plus 2pt}
\hjust{covar}(R↓p, R↓q) ⊗=\hjust{covar}(R↓p, R↑\prime↓{q})
-\hjust{covar}(R↓p, R↑\prime↓{q+1}).\cr}\eqno(20)$$
In {\sl Annals Math.\ Stat.\ \bf 15}
(1944), 163--165, J. Wolfowitz proved that the quantities $R↓1$,
$R↓2$, $\ldotss$, $R↓{t-1}$, $R↑\prime↓{t}$ become normally distributed
as $n → ∞$, subject to the mean and covariance expressed above;
this implies that the following test for runs is valid: Given
a sequence of $n$ random numbers, compute the number of runs
$R↓p$ of length $p$ for $1 ≤ p < t$, and also the number of
runs $R↑\prime↓{t}$ of length $≥t$. Let
$$\vcenter{\halign{\ctr{$#$}\cr
Q↓1 = R↓1 -\hjust{mean}(R↓1),\quad \ldotss,\quad
Q↓{t-1} = R↓{t-1} -\hjust{mean}(R↓{t-1}),\cr
Q↓t = R↑\prime↓t-\hjust{mean}(R↑\prime↓{t}).\cr}}\eqno(21)$$
%folio 83 galley 8 (C) Addison-Wesley 1978 *
Form the matrix $C$ of the covariances of the
$R'$s; for example, $C↓{13} = \hjust{covar}(R↓1, R↓3)$, while $C↓{1t}=\hjust{covar}
(R↓1, R↑\prime↓{t})$. When $t = 6$, we have
$$\def\\{\char'55 } % minus sign changed to hyphen, not enough room otherwise
\eqalign{C⊗= nC↓1 + C↓2\cr
\noalign{\vskip 6pt}
⊗=n\left(\vcenter{\halign{\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad
\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad\ctr{$#$}\cr
{23\over 180} ⊗{\\7\over 360} ⊗{\\5\over 336} ⊗{\\433\over
60480} ⊗{\\13\over 5670} ⊗{\\121\over 181440} \cr\noalign{\vskip 3pt}
{\\7\over 360} ⊗{2843\over 20160} ⊗{\\989\over 20160}
⊗{\\7159\over 363880} ⊗{\\10019\over 1814400} ⊗{\\1303\over
907200} \cr\noalign{\vskip 3pt}
{\\5\over 336} ⊗{\\989\over 20160} ⊗{54563\over 907200}
⊗{\\21311\over 1814400} ⊗{\\62369\over19958400} ⊗{\\7783\over
9979200} \cr\noalign{\vskip 3pt}
{\\433\over 60480} ⊗{\\7159\over 362880} ⊗{\\21311\over
1814400} ⊗{886657\over 39916800} ⊗{\\257699\over 239500800}
⊗{\\62611\over 239500800} \cr\noalign{\vskip 3pt}
{\\13\over 5670} ⊗{\\10019\over 1814400} ⊗{\\62369\over
19958400} ⊗{\\257699\over 239500800} ⊗{29874811\over 5448643200}
⊗{\\1407179\over 21794572800} \cr\noalign{\vskip 3pt}
{\\121\over 181440} ⊗{\\1303\over 907200} ⊗{\\7783\over
9979200} ⊗{\\62611\over 239500800} ⊗{\\1407179\over 21794572800}
⊗{2134697\over 1816214400} \cr
}}\right)\cr
\noalign{\vskip 6pt}
⊗\quad\null+\left(\vcenter{\halign{\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad
\ctr{$#$}⊗\quad\ctr{$#$}⊗\quad\ctr{$#$}\cr
{83\over 180} ⊗{\\29\over 180} ⊗{\\11\over
210} ⊗{\\41\over 12096} ⊗{91\over 25920} ⊗{41\over 18144} \cr\noalign{\vskip 3pt}
{\\29\over 180} ⊗{\\305\over 4032} ⊗{319\over 20160}
⊗{2557\over 72576} ⊗{10177\over 604800} ⊗{413\over 64800} \cr\noalign{\vskip 3pt}
{\\11\over 210} ⊗{319\over 20160} ⊗{\\58747\over
907200} ⊗{19703\over 604800} ⊗{239471\over 19958400} ⊗{39517\over
9979200} \cr\noalign{\vskip 3pt}
{\\41\over 12096} ⊗{2557\over 72576} ⊗{19703\over
604800} ⊗{\\220837\over 4435200} ⊗{1196401\over 239500800}
⊗{360989\over 239500800} \cr\noalign{\vskip 3pt}
{91\over 25920} ⊗{10177\over 604800} ⊗{239471\over
19958400} ⊗{1196401\over 239500800} ⊗{\\139126639\over 7264857600}
⊗{4577641\over 10897286400} \cr\noalign{\vskip 3pt}
{41\over 18144} ⊗{413\over 64800} ⊗{39517\over
9979200} ⊗{360989\over 239500800} ⊗{4577641\over 10897286400}
⊗{\\122953057\over 21794572800} \cr
}}\right)\cr}\eqno(22)$$
if $n ≥ 12$. Now form $A = (a↓{ij})$, the inverse
of the matrix $C$, and compute $\sum ↓{1≤i,j≤t}Q↓iQ↓ja↓{ij}$.
The result for large $n$ should have approximately the chi-square distribution with
$t$ degrees of freedom.
The matrix (11) given earlier is the inverse of
$C↓1$ to five significant figures. When $n$ is large, $A$ will
be approximately $(1/n)C↑{-1}↓{1}$;
a test with
$n = 100$ showed that the entries $a↓{ij}$ in (11) were each about
1 percent lower than the true values obtained by inverting (22).
\subsectionbegin{H. Maximum-of-t test} For
$0 ≤ j < n$, let $V↓j = \max(U↓{tj}, U↓{tj+1}, \ldotss , U↓{tj+t-1})$.
Now apply the Kolmogorov--\hskip 1pt Smirnov test to the sequence $V↓0$,
$V↓1$, $\ldotss$, $V↓{n-1}$, with the distribution function $F(x)
= x↑t$, $0 ≤ x ≤ 1$. Alternatively, apply the equidistribution
test to the sequence $V↑{t}↓{0}$, $V↑{t}↓{1}$, $\ldotss$, $V↑{t}↓{n-1}$.
To verify this test, we must show that the distribution
function for the $V↓j$ is $F(x) = x↑t$. The probability
that max$(U↓1, U↓2, \ldotss , U↓t) ≤ x$ is the probability that
$U↓1 ≤ x$ and $U↓2 ≤ x$ and $\ldots$ and $U↓t ≤ x$, which is the
product of the individual probabilities, namely $x x
\ldotsm x = x↑t$.
\subsectionbegin{I. Collision test} Chi-square tests can be made only
when there is a nontrivial number of items expected in each category.
But there is another kind of test that can be used when the number
of categories is much larger than the number of observations; this
test is related to ``hashing,'' an important method for information
retrieval that we shall study in Chapter 6.
Suppose we have $m$ urns and we throw $n$ balls at random into those
urns, where $m$ is much greater than $n$. Most of the balls will land
in urns that were previously empty, but if a ball falls into an urn
that already contains at
least one ball we say that a ``collision'' has occurred.
The collision test counts the number of collisions, and
a generator passes this test if it doesn't induce too many or too
few collisions.
To fix the ideas, suppose $m=2↑{20}$ and $n=2↑{14}$. Then each urn
will receive only one 64th of a ball, on the average. The probability that
a given urn will contain exactly $k$ balls is $p↓k={n\choose k}m↑{-k}
(1-m↑{-1})↑{n-k}$, so the expected number of collisions per urn is
$\sum↓{k≥1}(k-1)p↓k=\sum↓{k≥0}kp↓k-\sum↓{k≥1}p↓k=n/m-1+p↓0$. Since
$p↓0=(1-m↑{-1})↑n=1-n/m+{n\choose2}m↑{-2}+\null$smaller terms, we
find that the average total number of collisions taken over all $m$
urns is very slightly less than $n↑2/2=128$.
We can use the collision test to rate a random-number generator
in a large number of dimensions. For example, when $m=2↑{20}$ and
$n=2↑{14}$ we can test the 20-dimensional randomness of a number generator
by letting $d=2$ and forming 20-dimensional vectors $V↓j=(Y↓{20j},
Y↓{20j+1},\ldotss,Y↓{20j+19})$ for $0≤j<n$. It suffices to keep a
table of $m=2↑{20}$ bits to determine collisions, one bit for each
possible value of the vector $V↓j$; on a computer with 32 bits per word,
this amounts to $2↑{15}$ words. Initially all $2↑{20}$ bits of this table
are cleared to zero; then for each $V↓j$, if the corresponding bit is
already 1 we record a collision, otherwise we set the bit to 1. This test
can also be used in 10 dimensions with $d=4$, and so on.
To decide if the test is passed, we can use the following table of percentage
points when $m=2↑{20}$ and $n=2↑{14}$:
$$\vcenter{\halign{\rt{#}⊗\quad\rt{#}⊗\quad\rt{#}⊗\quad\rt{#}⊗\quad\rt{#}⊗\quad
\rt{#}⊗\quad\rt{#}⊗\quad\rt{#}\cr
collisions $≤$⊗101⊗108⊗119⊗126⊗134⊗145⊗153\cr
with probability⊗.009⊗.043⊗.244⊗.476⊗.742⊗.946⊗.989\cr}}$$
The theory underlying these probabilities is the same we used in the poker
test, Eq.\ (5); the probability that $c$ collisions occur is the
probability that $n-c$ urns are occupied, namely
$${m(m-1)\ldotsm(m-n+c+1)\over m↑n}{n\comb\{\}n-c}.$$
Although $m$ and $n$ are very large, it is not difficult to compute these
probabilities using the following method:
\algbegin Algorithm S (Percentage points for collision test). Given $m$ and
$n$, this algorithm determines the distribution of the number of collisions
that occur when $n$ balls are scattered into $m$ urns. An auxiliary $A[0]$,
$A[1]$, $\ldotss$, $A[n]$ of floating-point numbers is used for the
computation; actually $A[j]$ will be nonzero only for $j↓0≤j≤j↓1$, and
$j↓1-j↓0$ will be at most of order $\log n$, so it would be possible to
get by with considerably less storage.
\algstep S1. [Initialize.] Set $A[j]←0$ for $0≤j≤n$; then set $A[1]←1$ and
$j↓0←j↓1←1$. Then do step S2 exactly $n-1$ times and go on to step S3.
\algstep S2. [Update probabilities.] (Each time we do this step it
corresponds to tossing a ball into an urn; $A[j]$ represents the
probability that exactly $j$ of the urns are occupied.) Set $j↓1←j↓1+1$.
Then for $j←j↓1$, $j↓1-1$, $\ldotss$, $j↓0$ (in this order), set
$A[j]←(j/m)A[j]+\biglp(1+1/m)-(j/m)\bigrp A[j-1]$. If $A[j]$ has become
very small as a result of this calculation, say $A[j]<10↑{-20}$, set
$A[j]←0$; and in such a case, if $j=j↓1$ decrease $j↓1$ by 1, or if
$j=j↓0$ increase $j↓0$ by 1.
\algstep S3. [Compute the answers.] In this step we make use of an auxiliary
table $(T↓1,T↓2,\ldotss,T↓{\char'164\char'155\char'141\char'170})
=(.01,.05,.25,.50,.75,.95,.99,1.00)$
containing the speci\-fied percentage points of interest. Set $p←0$, $t←1$, and
$j←j↓0-1$. Do the following iteration until $t =\;$tmax: Increase $j$ by
1, and set $p←p+A[j]$; then if $p>T↓t$, output $n-j-1$ and $1-p$ (meaning
that with probability $1-p$ there are at most $n-j-1$ collisions) and
repeatedly increase $t$ by 1 until $p≤T↓t$.\quad\blackslug
\subsectionbegin{J. Serial correlation test}
Compute the following statistic:
$$C={\vjust{\halign{\rt{$#$}\cr
n(U↓0U↓1 + U↓1U↓2 + \cdots + U↓{n-2}U↓{n-1} + U↓{n-1}U↓0)\qquad\qquad\cr
- (U↓0 + U↓1 +\cdots+ U↓{n-1})↑2\cr}}
\over n(U↑{2}↓0 + U↑{2}↓{1}
+\cdots + U↑{2}↓{n-1}) - (U↓0 + U↓1 +\cdots
+ U↓{n-1})↑2}.\eqno (23)$$
This is the ``serial correlation coefficient,''
which is a measure of the amount $U↓{j+1}$ depends on $U↓j$.
Correlation coefficients appear frequently in statistics;
if we have $n$ quantities $U↓0$, $U↓1$, $\ldotss$, $U↓{n-1}$ and $n$
others $V↓0$, $V↓1$, $\ldotss$, $V↓{n-1}$, the correlation coefficient
between them is defined to be
$$C = {n \sum (U↓jV↓j) - (\sum U↓j)(\sum V↓j)\over
\sqrt{\biglp n \sum U↑{2}↓j - (\sum U↓{j})↑2\bigrp\biglp
n \sum V↑{2}↓{j} - (\sum V↓j)↑2\bigrp}}.\eqno(24)$$
All summations in this formula are to be taken
over the range $0 ≤ j < n$; Eq.\ (23) is the special case
$V↓j = U↓{(j+1)\mod n}$. ({\sl Note:} The denominator of (24)
is zero when $U↓0 = U↓1 =\cdots = U↓{n-1}$ or $V↓0
= V↓1 =\cdots = V↓{n-1}$; we exclude this case from
discussion.)
A correlation coefficient always lies between $-1$
and $+1$. When it is zero or very small, it indicates that the
quantities $U↓j$ and $V↓j$ are (relatively speaking) independent
of each other, but when the correlation coefficient is $\pm 1$
it indicates total linear dependence; in fact $V↓j = α \pm βU↓j$
for all $j$ in such a case, for some constants $α$ and $β$.
(See exercise 17.)
Therefore it is desirable to have $C$ in Eq.\ (23)
close to zero. In actual fact, since $U↓0U↓1$ is not completely
independent of $U↓1U↓2$, the serial correlation coefficient
is not expected to be {\sl exactly} zero. (See exercise 18.)
A ``good'' value of $C$ will be between $\mu ↓n - 2\sigma ↓n$
and $\mu ↓n + 2\sigma ↓n$, where
$$\mu ↓n = {-1\over n - 1} ,\qquad \sigma ↓n = {1\over n
- 1} \sqrt{n(n - 3)\over n + 1} ,\qquad n > 2.\eqno (25)$$
We expect $C$ to be between these limits about
95 percent of the time.
Equations (25) are only conjectured at this time,
since the exact distribution of $C$ is not known when the $U$'s
are uniformly distributed. For the theory when the $U$'s have
the {\sl normal} distribution, see the paper by Wilfrid J. Dixon,
{\sl Annals Math.\ Stat.\ \bf 15} (1944), 119--144.
Empirical evidence suggests that we may safely use the formulas for the
mean and standard deviation which have been derived from the
assumption of the normal distribution, without much error; these
are the values which have been listed in (25). It is known that
$\lim↓{n→∞}\sqrt{n} \sigma ↓n = 1$;
cf.\ the article by Anderson and Walker, {\sl Annals Math.\ Stat.\
\bf 35} (1964), 1296--1303, where more general
results about serial correlations of {\sl dependent} sequences are derived.
Instead of simply computing the correlation coefficient between the observations
$(U↓0,
U↓1,\ldotss,U↓{n-1})$ and their immediate successors
$(U↓1,\ldotss,U↓{n-1},U↓0)$, we can also
compute it between $(U↓0,U↓1,\ldotss,U↓{n-1})$ and any cyclically shifted
sequence $(U↓q,\ldotss,U↓{n-1},U↓0,\ldotss,U↓{q-1})$; the correlation
should be small for $0<q<n$. A straightforward computation of Eq.\
(24) for all $q$ would require about $n↑2$ multiplications, but it is
actually possible to compute all the correlations in only $O(n\log n)$ steps
by using ``fast Fourier transforms.'' $\biglp$See Section 4.6.4; cf.\ also L. P.
Schmid, {\sl CACM \bf8} (1965), 115.$\bigrp$
%galley 9 (which was not on paper tape) (C) Addison-Wesley 1978 *
\subsectionbegin{K. Tests on subsequences} It frequently happens that
the external program using our random sequence will call for
numbers in batches. For example, if the program works with three random
variables $X$, $Y$, and $Z$, it may consistently invoke the generation
of three random numbers at a time. In such applications it is important
that the subsequences consisting of every {\sl third} term of the original
sequence be random. If the program requ≠res $q$ numbers at a time, the
sequences
$$U↓0,U↓q,U↓{2q},\ldotss;\quad U↓1,U↓{q+1},U↓{2q+1},\ldotss;\quad\ldotss;
\quad U↓{q-1},U↓{2q-1},U↓{3q-1},\ldotss$$
can each be put through the tests described above for the original sequence
$U↓0$, $U↓1$, $U↓2$, $\ldotss\,$.
Experience with linear congruential sequences has shown that these
derived sequences rarely if ever behave less randomly than the original
sequence, unless $q$ has a large factor in common with the period length.
On a binary computer with $m$ equal to the word size, for example, a test
of the subsequences for $q=8$ will tend to give the poorest randomness for all
$q<16$; and on a decimal computer, $q=10$ is the most likely to be
unsatisfactory. (This can be explained somewhat on the grounds of potency,
since such values of $q$ will tend to lower the potency.)
\subsectionbegin{L. Historical remarks and further discussion}
Statistical tests arose naturally in the course of scientists' efforts to
``prove'' or ``disprove'' hypotheses about various observed data. The
best known early papers dealing with the testing of artificially-generated
numbers for randomness are two articles by M. G. Kendall and B.
Babington-Smith in the {\sl Journal of the Royal Statistical Society \bf 101}
(1938), 147--166, and in the supplement to that journal, {\bf 6} (1939),
51--61. These papers were concerned with the testing of random digits between
0 and 9, rather than random real numbers; for this purpose, the authors
discussed the frequency test, serial test, gap test, and poker test, although
they misapplied the serial test. Kendall and Babington-Smith also used a
variant of the coupon collector's test, but the method described in this
section was introduced by R. E. Greenwood in 1955.
The run test has a rather interesting history. Originally, tests were made
on runs up and down at once: a run up would be followed by a run down, then
another run up, and so on. Note that the run test and the permutation test
do not depend on the uniform distribution of the $U$'s, they depend only
on the fact that $U↓i=U↓j$ occurs with probability zero when $i≠j$; therefore
these tests can be applied to many types of random sequences. The run test
in primitive form was originated by J. Bienaym\'e [{\sl Comptes Rendus \bf 81}
(Paris: Acad. Sciences, 1875), 417--423]. Some sixty years later, W. O.
Kermack and A. G. McKendrick published two extensive papers on the subject
({\sl Proc.\ Royal Society Edinburgh \bf57} (1937), 228--240, 332--376];
as an example they stated that Edinburgh rainfall between the years 1785 and 1930
was ``entirely random in character'' with respect to the run test (although
they examined only the mean and standard deviation of the run lengths). Several
other people began using the test, but it was not until 1944 that the use of
the chi-square method in connection with this test was shown to be incorrect.
The paper by H. Levene and J. Wolfowitz, in {\sl Annals Math.\ Stat.\ \bf15}
(1944), 58--69, introduced the correct run test (for runs up and down,
alternately) and discussed the fallacies in earlier misuses of that test.
Separate tests for runs up and runs down, as proposed in the text above, are
more suited to computer application, so we have not given the more complex
formulas for the alternate-up-and-down case. See the survey paper by D. E.
Barton and C. L. Mallows, {\sl Annals Math.\ Stat.\ \bf36} (1965),
236--260.
Of all the tests we have discussed, the frequency test and the serial correlation
test seem to be the weakest, in the sense that nearly all random-number generators
pass these tests. Theoretical grounds for the weakness of these tests are
discussed briefly in Section 3.5 (cf.\ exercise 3.5--26). The run test, on the
other hand, is a rather strong test: the results of exercises 3.3.3--23 and 24
suggest that linear congruential generators tend to have runs somewhat
longer than normal if the multiplier is not large enough, so the run test of
exercise 14 is definitely to be recommended.
The collision test is also highly recommended, since it has been especially
designed to detect the deficiencies of many poor generators that have unfortunately
become widespread. This test, which is based on ideas of H. Delgas
Christiansen [Inst.\ Math.\ Stat.\ and Oper.\ Res., Tech.\ Univ.\
Denmark (Oct.\ 1975), unpublished], is unlike the others in that it was not
developed before the advent of computers; it is specifically intended for
computer use.
\yskip The reader probably wonders, {\sl``Why are there so many tests?''\/} It has
been said that more computer time is spent testing random numbers
than using them in applications! This is untrue, although it is possible to
go overboard in testing.
The need for making several tests has been amply documented. It has been
recorded, for example, that some numbers generated by a variant of the
middle-square method have passed the frequency test, gap test, and poker test, yet
flunked the serial test. Linear congruential sequences with small multipliers
have been known to pass many tests, yet fail on the run test because there are
too few runs of length one. The maximum-of-$t$ test has also been used to ferret
out some bad generators which otherwise seemed to perform respectably.
Perhaps the main reason for doing extensive testing on random-number generators
is that people misusing Mr.\ X's random-number generator will hardly ever
admit that their programs are at fault: they will blame the
generator, until Mr.\ X can {\sl prove} to them that his numbers are
sufficiently random. On the other hand, if the source of random numbers is only
for Mr.\ X's personal use, he might decide not to bother to test them, since
the techniques recommended in this chapter have a high
probability of being satisfactory.
\exbegin{EXERCISES}
\exno 1. [10] Why should the serial test described in part B be applied to
$(Y↓0,Y↓1)$, $(Y↓2,Y↓3)$, $\ldotss$, $(Y↓{2n-2},Y↓{2n-1})$ instead of to
$(Y↓0,Y↓1)$, $(Y↓1,Y↓2)$, $\ldotss$, $(Y↓{n-1},Y↓n)$?
\exno 2. [10] State an appropriate way to generalize the serial test to
triples, quadruples, etc., instead of pairs.
\trexno 3. [M20] How many $U$'s need to be examined in the gap test (Algorithm
G) before $n$ gaps have been found, on the average, assuming that the sequence
is random? What is the standard deviation of this quantity?
\exno 4. [12] Prove that the probabilities in (4) are correct for the gap test.
\exno 5. [M23] The ``classical'' gap test used by Kendall and Babington-Smith
considers the numbers $U↓0$, $U↓1$, $\ldotss$, $U↓{N-1}$ to be a cyclic
sequence with $U↓{N+j}$ identified with $U↓j$. Here $N$ is a fixed number of
$U$'s which are to be subjected to the test. If $n$ of the numbers $U↓0$,
$\ldotss$, $U↓{N-1}$ fall into the range $α≤U↓j<β$, there are $n$ gaps in the
cyclic sequence. Let $Z↓r$ be the number of gaps of length $r$, for $0≤r<t$,
and let $Z↓t$ be the number of gaps of length $≥t$; show that the quantity
$V=\sum↓{0≤r≤t}(Z↓r-np↓r)↑2/np↓r$ should have the chi-square distribution
with $t$ degrees of freedom, in the limit as $N$ goes to infinity, where
$p↓r$ is given in Eq.\ (4).
\exno 6. [40] (H. Geiringer.)\xskip A frequency count of the first 2000 decimal
digits in the representation of $e=2.71828\ldots$ gave a $\chi↑2$ value of
1.06, indicating that the actual frequencies of the digits 0, 1, $\ldotss$,
9 are much too close to their expected values to be considered
randomly distributed. (In fact, $\chi↑2≥1.15$ with probability 99.9 percent.)
The same test applied to the first 10,000 digits of $e$ gives the reasonable
value $\chi↑2=8.61$; but the fact that the first 2000 digits are so evenly
distributed is still surprising. Does the same phenomenon occur in the
representation of $e$ to other bases? [See {\sl AMM \bf72} (1965), 483--500.]
\exno 7. [08] Apply the coupon collector's test procedure (Algorithm C) with
$d=3$ and $n=7$, to the following sequence: 1101221022120202001212201010201121.
What lengths do the seven subsequences have?
\trexno 8. [M22] How many $U$'s need to be examined, on the average, in the
coupon collector's test (Algorithm C) before $n$ complete sets have been found,
assuming that the sequence is random? What is the standard deviation?\xskip
[{\sl Hint:} See Eq.\ 1.2.9--28.]
\exno 9. [M21] Generalize the coupon collector's test so that the search stops
as soon as $w$ distinct values have been found, where $w$ is a fixed positive
integer less than or equal to $d$. What probabilities should be used in
place of (6)?
\exno 10. [M23] Solve exercise 8 for the more general coupon collector's test
described in exercise 9.
\exno 11. [00] The ``runs up'' in a particular permutation are displayed in
(9); what are the ``runs down'' in that permutation?
\exno 12. [20] Let $U↓0$, $U↓1$, $\ldotss$, $U↓{n-1}$ be $n$ distinct numbers.
Write an algorithm which determines the lengths of all ascending runs in the
sequence. When your algorithm terminates, {\tt COUNT}$[r]$ should
be the number of runs of length $r$, for $1≤r≤5$, and {\tt COUNT}[6] should be
the number of runs of length 6 or more.
\exno 13. [M23] Show that (16) is the number of permutations of $p+q+1$ distinct
elements having the pattern (15).
%folio 88 galley 10 (C) Addison-Wesley 1978 *
\trexno 14. [M15] If we ``throw
away'' the element which immediately follows a run, so that
when $X↓j$ is greater than $X↓{j+1}$ we start the next run with
$X↓{j+2}$, the run lengths are independent, and a simple chi-square
test may be used (instead of the horribly complicated method
derived in the text). What are the appropriate run-length probabilities
for this simple run test?
\exno 15. [M10] In the ``maximum of $t\,$'' test, why are $V↑{t}↓{0}$,
$V↑{t}↓{1}$, $\ldotss$, $V↑{t}↓{n-1}$ supposed to be uniformly distributed
between zero and one?
\trexno 16. [15] (a) Mr. J. H. Quick (a student) wanted to perform
the maximum-of-$t$ test for various values of $t$. Letting
$Z↓{jt} = \max(U↓j, U↓{j+1}, \ldotss , U↓{j+t-1})$, he found
a clever way to go from the sequence $Z↓{0(t-1)}$, $Z↓{1(t-1)}$,
$\ldotss $, to the sequence $Z↓{0t}$, $Z↓{1t}$, $\ldotss $, using very
little time and space. What was his bright idea?
(b) He decided to modify the maximum-of-$t$
method so that the $j$th observation would be
$\max(U↓j, \ldotss , U↓{j+t-1})$; in other
words, he took $V↓j = Z↓{jt}$ instead of $V↓j = Z↓{(tj)t}$ as
the text says. He reasoned that {\sl all} of the $Z$'s should
have the same distribution, so the test is even stronger if
each $Z↓{jt}$, $0 ≤ j < n$, is used instead of just every $t$th
one. But when he tried a chi-square equidistribution test on
the values of $V↑{t}↓{j}$, he got extremely high values of the
statistic $V$, which got even higher as $t$ increased. Why did
this happen?
\exno 17. [M25] (a) Given any numbers $U↓0, \ldotss , U↓{n-1}, V↓0,
\ldotss , V↓{n-1}$, let
$$\=u={1\over n}\sum↓{0≤k<n}U↓k,\qquad\qquad\=v={1\over n}\sum↓{0≤k<n}V↓k.$$
Let $U↑\prime↓{k} = U↓k - \=u$, $V↑\prime↓{k}
= V↓k - \=v$. Show that the correlation coefficient $C$
given in Eq.\ (24) is equal to
$$\sum ↓{0≤k<n} U↑\prime↓{k}V↑\prime↓{k}\left/\ssqrt{\sum↓{0≤k<n}
{U↑\prime↓k}↑2}\ssqrt{\sum↓{0≤k<n}{V↑\prime↓k}↑2}\right..$$
(b) Let $C = N/D$, where $N$ and
$D$ denote the numerator and denominator of the expression in
part (a). Show that $N↑2 ≤ D↑2$, hence $-1 ≤ C ≤ 1$; and obtain
a formula for the difference $D↑2 - N↑2$.\xskip [{\sl Hint:} See exercise
1.2.3--30.]
(c) If $C = \pm 1$, show that $αX↓k + βY↓k = \tau$,
$0 ≤ k < n$, for some constants $α$, $β$, and $\tau$, not all zero.
\exno 18. [M20] (a) Show that if $n = 2$, the serial correlation
(23) is always equal to $-1$ (unless the denominator is zero).\xskip
(b) Similarly, show that when $n = 3$, the serial correlation
coefficient always equals $-{1\over 2}$.\xskip (c) Show that the denominator
in (23) is zero if and only if $U↓0 = U↓1 =\cdots
= U↓{n-1}$.
\exno 19. [M40] What are the mean and standard deviation of
the serial correlation coefficient (23) when $n = 4$ and the
$U$'s are independent and uniformly distributed between zero
and one?
\exno 20. [M47] Find the distribution of the serial correlation
coefficient (23), for general $n$, assuming that the $U↓j$ are
independent random variables uniformly distributed between zero
and one.
\exno 21. [19] What value of $f$ is computed by Algorithm P
if it is presented with the permutation $(1, 2, 9, 8, 5, 3, 6,
7, 0, 4)$?
\exno 22. [18] For what permutation of the integers $\{0, 1, 2,
3, 4, 5, 6, 7, 8, 9\}$ will Algorithm P produce
the value $f=1024$?
\runningrighthead{THEORETICAL TESTS} section{3.3.3}
\sectionskip
\sectionbegin{\star3.3.3. Theoretical Tests}
Although it is
always possible to test a random-number generator using the
methods in the previous section, it is far better to have ``{\sl
a priori} tests,'' i.e., theoretical results which tell us in
advance how well those tests will come out. Such theoretical results
give us much more understanding about the generation
methods than empirical, ``trial-and-error'' results do. In this
section we shall study the linear congruential sequences in
more detail; if we know what the results of certain tests will be before we actually
generate the numbers, we have a better chance of choosing $a$,
$m$, and $c$ properly.
The development of this kind of theory is quite
difficult, although some progress has been made. The results
obtained so far are generally for {\sl statistical tests made
over the entire period.} Not all statistical tests make sense
when they are applied over a full period---for example, the equidistribution
test will give results which are too perfect---but
the serial test, gap test, permutation
test, maximum test, etc.\ can be fruitfully analyzed in this way. Such
studies will detect {\sl global\/} nonrandomness of a sequence, i.e.,
improper behavior in very large samples.
The theory we shall discuss is quite illuminating, but it does not eliminate
the need for testing local nonrandomness by the methods of Section 3.3.2.
Indeed, it appears to be extremely hard to prove anything useful about
short subsequences. Only a few theoretical results are known about the behavior
of linear congruential sequences over less than a full period; these
will be discussed at the end of Section 3.3.4. (See also exercise 18.)
Let us begin with a proof of a simple {\sl a priori}
law, for the least complicated case of the permutation test.
The gist of our first theorem is that we have $X↓{n+1} < X↓n$
about half the time, provided that the sequence has high potency.
\thbegin Theorem P. {\sl Let $a$, $c$, and
$m$ generate a linear congruential sequence with maximum period;
let $b = a - 1$ and let $d$ be the greatest common divisor of
$m$ and $b$. The probability that $X↓{n+1} < X↓n$ is equal to ${1\over
2} + r$, where
$$r = \biglp 2(c\mod d) - d\bigrp /2m;\eqno (1)$$
hence $|r| < d/2m$.}
\proofbegin The proof of this theorem involves some
techniques which are of interest in themselves. First we define
$$s(x) = (ax + c)\mod m.\eqno (2)$$
Thus, $X↓{n+1} = s(X↓n)$, and the theorem reduces
to counting the number of integers $x$ such that $0 ≤x<m$ and $s(x)<x$
(since each such intger occurs somewhere in the period). We want to
show that this number is
$$\textstyle{1\over2}
\biglp m + 2(c \mod d) - d\bigrp .\eqno (3)$$
The function $\left\lceil \biglp x - s(x)\bigrp
/m\right\rceil$ is equal to 1 when $x > s(x)$, and it is 0 otherwise;
hence the count we wish to obtain can be written simply as
$$\eqalignno{\sum ↓{0≤x<m}\left\lceil x - s(x)\over m\right\rceil⊗ =
\sum ↓{0≤x<m}\left\lceil
{x\over m} - \left({ax + c\over m} -
\left\lfloor ax + c\over m\right\rfloor\right)\right\rceil\cr
⊗= \sum ↓{0≤x<m}\left(\left\lfloor ax+c\over m\right\rfloor-
\left\lfloor bx+c\over m\right\rfloor\right).⊗(4)\cr}$$
(Recall that $\lceil -y\rceil = - \lfloor y\rfloor$
and $b = a - 1$.) Such sums can be evaluated by the method of
exercise 1.2.4--37, where we have proved that
$$\sum ↓{0≤j<k}\left\lfloor hj + c\over k\right\rfloor = {(h - 1)(k - 1)\over
2} + {g - 1\over 2} + g\lfloor c/g\rfloor ,\qquad g = \gcd(h,
k),\eqno (5)$$
whenever $h$ and $k$ are integers and $k > 0$.
Since $a$ is relatively prime to $m$, this formula yields
$$\eqalign{\sum ↓{0≤x<m}\left\lfloor ax + c\over m\right\rfloor⊗= {(a -
1)(m - 1)\over 2} + c,\cr
\sum ↓{0≤x<m}\left\lfloor bx + c\over m\right\rfloor⊗= {(b
- 1)(m - 1)\over 2} + {d - 1\over 2} + c - (c\mod d),\cr}$$
and (3) follows immediately.\quad\blackslug
%folio 90 galley 11 (C) Addison-Wesley 1978 *
\yyskip The proof of Theorem P indicates that {\sl
a priori} tests can indeed be carried out, provided that we are able
to deal satisfactorily with sums involving the $\lfloor\; \rfloor$
and $\lceil\; \rceil$ functions. In many cases the most powerful
technique for dealing with floor and ceiling functions is to
replace them by two somewhat more symmetrical ones:
$$
\eqalignno{\delta(z)⊗=\lfloor z\rfloor + 1-\lceil z\rceil =
\left\{\vcenter{\halign{#⊗\qquad\hjust{if $z$ }#\hfill\cr
1,⊗is an integer;\cr 0,⊗is not an integer;\cr}}\right.⊗(6)\cr
\noalign{\vskip 4pt}
((z))⊗\textstyle= z - \lfloor z\rfloor - {1\over 2}
+ {1\over 2}\delta (z) = z - \lceil z\rceil + {1\over 2} - {1\over
2}\delta (z).⊗(7)\cr}$$
The latter function is a ``sawtooth''
function familiar in the study of Fourier series; its graph
is shown in Fig.\ 7. The reason for choosing to work with $((z))$
rather than $\lfloor z\rfloor$ or $\lceil
z\rceil$ is that $((z))$ possesses several very useful
properties:
$$\eqalignno{((-z))⊗=-((z));⊗(8)\cr
\noalign{\vskip 4pt}
((z+n))⊗=((z)),\quad\hjust{integer }n;⊗(9)\cr
\noalign{\vskip 4pt}
((nz))⊗=((z))+\left(\left(z+{1\over n}\right)\right)+\cdots+
\left(\left(z+{n-1\over n}\right)\right),\quad\hjust{integer }n≥1.⊗(10)\cr}$$
(See exercise 2.)
\topinsert{\vskip 25mm
\ctrline{\caption Fig. 7. The sawtooth function $((z))$.}}
In order to get some practice working with these
functions, let us prove Theorem P again, this time without relying
on exercise 1.2.4--37. With the help of Eqs.\ (7), (8), (9),
we can show that
$$\eqalignno{\left\lceil x-s(x)\over m\right\rceil⊗=
{x - s(x)\over m} - \left(\left(x-s(x)\over m\right)\right)+{1\over2}
-{1\over 2}\delta\left(x-s(x)\over m\right)\cr\noalign{\vskip 4pt}
⊗= {x - s(x)\over m} - \left(\left(x-(ax+c)\over m\right)\right)+{1\over2}\cr
\noalign{\vskip 4pt}
⊗= {x - s(x)\over m} + \left(\left(bx+c\over m\right)\right)+{1\over2}⊗(11)\cr}$$
since $\biglp x - s(x)\bigrp /m$ is never an integer. Now
$$\sum ↓{0≤x<m} {x - s(x)\over m} = 0$$
since both $x$ and $s(x)$ take on each value of $\{0, 1, \ldotss
, m - 1\}$ exactly once; hence (11) yields
$$\sum ↓{0≤x<m} \left\lceil x - s(x)\over m\right\rceil = \sum ↓{0≤x<m}
\left(\left(bx+c\over m\right)\right)+{m\over 2}.\eqno(12)$$
Let $b = b↓0d$, $m = m↓0d$, where $b↓0$
and $m↓0$ are relatively prime. We know that $(b↓0x)\mod m↓0$
takes on the values $\{0, 1, \ldotss , m↓0 - 1\}$ in some order
as $x$ varies from 0 to $m↓0 - 1$. By (9) and (10) and the fact
that
$$\left(\left(b(x+m↓0)+c\over m\right)\right)
=\left(\left(bx+c\over m\right)\right)$$
we have
$$\eqalignno{\sum↓{0≤x<m}\left(\left(bx+c\over m\right)\right)⊗=
d\sum↓{0≤x≤m↓0}\left(\left(bx+c\over m\right)\right)\cr\noalign{\vskip 4pt}
⊗=d\sum↓{0≤x<m↓0}\left(\left({c\over m}+{b↓0x\over m}\right)\right)=
d\left(\left(c\over d\right)\right).⊗(13)\cr}$$
Theorem P follows immediately from (12) and (13).
One consequence of Theorem P is that practically
{\sl any} choice of $a$ and $c$ will give a reasonable probability
that $X↓{n+1} < X↓n$, at least over the entire period, except
those which have large $d$. A large value of $d$ corresponds
to low potency, and we already know that generators of low potency
are undesirable.
The next theorem gives us a more stringent condition
for the choice of $a$ and $c$; we will consider the {\sl serial
correlation test} applied over the entire period. The quantity
$C$ defined in Section 3.3.2, Eq.\ (23), is
$$C = \biggglp m\sum ↓{0≤x<m}xs(x) - \bigglp\sum
↓{0≤x<m}x\biggrp↑2\bigggrp\left/\biggglp m\sum↓{0≤x<m}x↑2-\bigglp\sum↓{0≤x<m}x
\biggrp↑2\bigggrp\right..\eqno(14)$$
Let $x↑\prime$ be the element such that $s(x↑\prime
) = 0$. We have
$$s(x) = m\left(\left(ax + c\over m\right)\right)+{m\over 2},
\qquad\hjust{if }x≠x↑\prime.\eqno(15)$$
The formulas we are about to derive can be expressed
most easily in terms of the function
$$\sigma (h, k, c) = 12\sum ↓{0≤j<k} \left(\left(j\over k\right)\right)
\left(\left(hj+c\over k\right)\right),\eqno(16)$$
an important function that arises in several
mathematical problems. It is called a {\sl generalized Dedekind
sum}, since Richard Dedekind introduced the function $\sigma
(h, k, 0)$ in 1876 when commenting on one of Riemann's incomplete
manuscripts. [See B. Riemann's {\sl Gesammelte Math.\ Werke},
2nd ed. (1892), 466--478.]
Using the well-known formulas
$$\sum ↓{0≤x<m}x = {m(m - 1)\over 2}\qquad\hjust{and}\qquad \sum
↓{0≤x<m}x↑2 = {m(m - 1)(2m - 1)\over 6} ,$$
it is a straightforward matter to transform
Eq.\ (14) into
$$C = {m\sigma (a, m, c) - 3 + 6(m - x↑\prime - c)\over m↑2
- 1} .\eqno (17)$$
(See exercise 5.) Since $m$ is usually very large,
we may discard terms of order $1/m$, and we have the approximation
$$C \approx \sigma (a, m, c)/m,\eqno (18)$$
with an error of less than $6/m$ in absolute value.
The serial correlation test now reduces to determining
the value of the Dedekind sum $\sigma (a, m, c)$. Evaluating
$\sigma (a, m, c)$ directly from its definition (16) is hardly
any easier than evaluating the correlation coefficient itself
directly, but fortunately there are simple methods available
for computing Dedekind sums quite rapidly.
\algbegin Lemma B (``Reciprocity law''
for Dedekind sums). {\sl Let $h$, $k$, $c$ be integers. If $0 ≤ c < k$, $0
< h ≤ k$, and if $h$ is relatively prime to $k$, then
$$\sigma (h, k, c) + \sigma (k, h, c) = {h\over k} + {k\over
h} + {1\over hk} + {6c↑2\over hk} - 6\left\lfloor c\over h\right\rfloor
- 3e(h, c),\eqno (19)$$
where
$$e(h,c)=\left\{\vcenter{\halign{$#$⊗\qquad if $#$⊗\ctr{ # }⊗$#$\cr
1,⊗c=0⊗or⊗c\mod h≠0;\cr
0,⊗c>0⊗and⊗c\mod h=0.\cr}}\right.\eqno(20)$$}
%folio 95 galley 12 (C) Addison-Wesley 1978 *
\proofbegin We leave it to the reader
to prove that, under these hypotheses,
$$\sigma (h, k, c) + \sigma (k, h, c) = \sigma (h, k, 0) +
\sigma (k, h, 0) + {6c↑2\over hk} - 6\left\lfloor c\over h\right\rfloor - 3e(h,
c) + 3.\eqno (21)$$
(See exercise 6.) The lemma now must be proved
only in the case $c = 0$.
The proof we will give, based on complex roots
of unity, is essentially due to L. Carlitz. There is actually
a simpler proof which uses only elementary manipulations of
sums (see exercise 7)---but the following method reveals more of
the mathematical tools which are available for problems of this
kind and it is therefore much more instructive.
Let $f(x)$ and $g(x)$ be polynomials defined as
follows:
$$\eqalign{f(x)⊗ = 1 + x +\cdots + x↑{k-1} = (x↑k
- 1)/(x - 1)\cr\noalign{\vskip 4pt}
g(x) ⊗= x + 2x↑2 +\cdots
+ (k - 1)x↑{k-1} = xf↑\prime (x)\cr\noalign{\vskip 4pt}
⊗= kx↑k/(x - 1) - x(x↑k - 1)/(x - 1)↑2.\cr}\eqno(22)$$
If $\omega$ is the complex $k$th root of unity $e↑{2πi/k}$,
we have by Eq.\ 1.2.9--13
$${1\over k} \sum ↓{0≤j<k}\omega ↑{-jr}g(\omega
↑jx) = rx↑r,\qquad\hjust{if }0 ≤ r < k.\eqno (23)$$
Set $x = 1$; then $g(\omega ↑jx) = k/(\omega ↑j
- 1)$ if $j ≠ 0$, otherwise it equals $k(k - 1)/2$, therefore
$$r \mod k = \sum ↓{0<j<k} {\omega ↑{-jr}\over \omega ↑j -
1} + \textstyle{1\over 2}(k - 1),\qquad\hjust{if $r$ is an integer.}$$
$\biglp$Eq.\ (23) shows that the right-hand side
equals $r$ when $0 ≤ r < k$, and it is unchanged when multiples
of $k$ are added to $r$.$\bigrp$ Hence
$$\left(\left(r\over k\right)\right)=
{1\over k} \sum ↓{0<j<k} {\omega ↑{-jr}\over
\omega ↑j - 1} - {1\over 2k} + {1\over 2} \delta \left(r\over
k\right) .\eqno (24)$$
This important formula, which holds whenever
$r$ is an integer, allows us to reduce many calculations involving
$((r/k))$ to sums involving $k$th roots of unity,
and it brings a whole new range of techniques into the picture.
In particular, we get the following formula:
$$\sigma (h, k, 0) + {3(k - 1)\over k↑2} = {12\over k↑2} \sum
↓{0<r<k}\, \sum ↓{0<i<k}\, \sum ↓{0<j<k} {\omega ↑{-ir}\over \omega
↑i - 1} {\omega ↑{-jhr}\over \omega ↑j - 1} .\eqno (25)$$
The right-hand side of this formula may be simplified
by carrying out the sum on $r$; we have $\sum ↓{0≤r<k}\omega
↑{rs} = f(\omega ↑s) = 0$ if $s \mod k ≠ 0$. Equation (25)
now reduces to
$$\eqalignno{\noalign{\vskip 3pt}
\sigma (h, k, 0) + {3(k - 1)\over k} ⊗= {12\over k} \sum ↓{0<j<k}
{1\over (\omega ↑{-jh} - 1)(\omega ↑j - 1)} .⊗(26)\cr}$$
A similar formula is obtained for $\sigma (k, h,
0)$, with $\zeta = e↑{2πi/h}$ replacing $\omega $.
It is not obvious what we can do with
the sum in (26), but there is an elegant way to proceed, based
on the fact that each term of the sum is a function of $\omega
↑j$, where $0 < j < k$; hence the sum is essentially taken over the
$k$th roots of unity other than 1. Whenever $x↓1$, $x↓2$, $\ldotss$,
$x↓n$ are distinct complex numbers, we have the identity
$$\halign{\hjust to size{#}\cr
\quad$\dispstyle\sum ↓{1≤j≤n} {1\over (x↓j - x↓1) \ldotsm (x↓j
- x↓{j-1})(x - x↓j)(x↓j - x↓{j+1}) \ldotsm (x↓j - x↓n)}$\hfill\cr
\noalign{\penalty1000}
\hfill$\dispstyle={1\over (x - x↓1) \ldotsm (x - x↓n)}$ ,\quad
(27)\cr}$$
which follows from the usual method of expanding
the right-hand side into partial fractions. Moreover, if $q(x)
= (x - y↓1)(x - y↓2) \ldotsm (x - y↓m)$, we have
$$q↑\prime (y↓j) = (y↓j - y↓1) \ldotsm (y↓j - y↓{j-1})(y↓j -
y↓{j+1}) \ldotsm (y↓j - y↓m);\eqno (28)$$
this identity may often be used to simplify expressions
like those in the left-hand side of (27). When $h$ and $k$ are
relatively prime, the numbers $\omega$, $\omega ↑2$, $\ldotss $,
$\omega ↑{k-1}$, $\zeta$, $\zeta ↑2$, $\ldotss$, $\zeta ↑{h-1}$ are all distinct;
we can therefore consider (27) in the special case of the polynomial
$(x - \omega ) \ldotsm (x - \omega ↑{k-1})(x - \zeta ) \ldotsm (x
- \zeta ↑{h-1}) = (x↑k - 1)(x↑h - 1)/(x - 1)↑2$, obtaining the
following identity in $x$:
$$\chop{10pt}{{1\over h} \sum ↓{0<j<h} {\zeta ↑j(\zeta ↑j - 1)↑2\over (\zeta
↑{jk} - 1)(x - \zeta ↑j)} + {1\over k} \sum ↓{0<j<k} {\omega
↑j(\omega ↑j - 1)↑2\over (\omega ↑{jh} - 1)(x - \omega ↑j)}
= {(x - 1)↑2\over (x↑h - 1)(x↑k - 1)}.}\eqno(29)$$
This identity has many interesting consequences,
and it leads to numerous reci\-procity formulas for sums of the
type given in Eq.\ (26). For example, if we differentiate (29)
twice with respect to $x$ and let $x → 1$, we find that
$$\halign{\hjust to size{#}\cr
\quad$\dispstyle {2\over h} \sum ↓{0<j<h} {\zeta ↑j(\zeta ↑j - 1)↑2\over
(\zeta ↑{jk} - 1)(1 - \zeta ↑j)↑3} + {2\over k} \sum ↓{0<j<k}
{\omega ↑j(\omega ↑j - 1)↑2\over (\omega ↑{jh} - 1)(1 - \omega
↑j)↑3}$\hfill\cr\noalign{\penalty1000\vskip-2pt}
\hfill$\dispstyle
={1\over 6}\left({h\over k} + {k\over
h} + {1\over hk}\right) + {1\over 2} - {1\over 2h} - {1\over
2k} .\quad$\cr}$$
Replace $j$ by $h - j$ and by $k - j$
in these sums and use (26) to get
$$\halign{\hjust to size{#}\cr
\quad$\dispstyle {1\over 6}\left(\sigma (k, h, 0) + {3(h
- 1)\over h}\right) + {1\over 6}\left(\sigma (h, k, 0) + {3(k -
1)\over k}\right)$\hfill\cr\noalign{\penalty1000}
\hfill$\dispstyle
={1\over 6}\left({h\over k} + {k\over
h} + {1\over hk}\right) + {1\over 2} - {1\over 2h} - {1\over
2k} ,\quad$\cr}$$
which is equivalent to the desired result.\quad\blackslug
\yyskip Lemma B gives us an explicit function $f(h, k,
c)$ such that
$$\sigma (h, k, c) = f(h, k, c) - \sigma (k, h, c)\eqno (30)$$
whenever $0 < h ≤ k$, $0 ≤ c< k$, and $h$ is relatively
prime to $k$. From the definition (16) it is clear that
$$\sigma (k, h, c) = \sigma (k \mod h,\, h,\, c \mod h).\eqno
(31)$$
Therefore we can use (30) iteratively to evaluate
$\sigma (h, k, c)$, using a process that reduces the parameters as in
Euclid's algorithm.
Further simplifications occur when we examine this
iterative procedure more closely. Let us set $m↓1 = k$, $m↓2 =
h$, $c↓1 = c$, and form the following tableau:
$$\vcenter{\halign{\rt{$#$}⊗\lft{$\;#$}\qquad⊗\rt{$#$}⊗\lft{$\;#$}\cr
m↓1 ⊗ = a↓1m↓2 + m↓3⊗ c↓1⊗= b↓1m↓2 + c↓2\cr
m↓2⊗ = a↓2m↓3 + m↓4⊗ c↓2⊗= b↓2m↓3 + c↓3\cr m↓3⊗ = a↓3m↓4
+ m↓5⊗ c↓3⊗= b↓3m↓4 + c↓4\cr m↓4⊗= a↓4m↓5⊗ c↓4
⊗= b↓4m↓5 + c↓5\cr}}\eqno (32)$$
Here
$$\vcenter{\halign{\lft{$#$}\qquad⊗\lft{$#$}\cr
a↓j = \lfloor m↓j/m↓{j+1}\rfloor ,⊗b↓j = \lfloor
c↓j/m↓{j+1}\rfloor ,\cr\noalign{\vskip 4pt} m↓{j+2} = m↓j \mod m↓{j+1},⊗
c↓{j+1} = c↓j \mod m↓{j+1},\cr}}\eqno(33)$$
and it follows that
$$0 ≤ m↓{j+1}< m↓j,\qquad 0 ≤ c↓j < m↓j.\eqno (34)$$
We have assumed for convenience that Euclid's algorithm
terminates in (32) after four iterations; this assumption will
reveal the pattern which holds in the general case. Since $h$
and $k$ were relatively prime to start with, we must have $m↓5
= 1$ and $c↓5 = 0$ in (32).
%folio 98 galley 13 (C) Addison-Wesley 1978 *
Let us further assume that $c↓3 ≠ 0$ but $c↓4 =
0$, in order to get a feeling for the effect this has on the
recurrence. Equations (30) and (31) yield
$$\eqalign{\sigma (h, k, c) ⊗= \sigma (m↓2, m↓1, c↓1)\cr
⊗= f(m↓2, m↓1, c↓1) - \sigma (m↓3, m↓2, c↓2)\cr
⊗= \cdots\cr
⊗= f(m↓2, m↓1, c↓1) - f(m↓3, m↓2, c↓2) + f(m↓4, m↓3,
c↓3) - f(m↓5, m↓4, c↓4).\cr}$$
The first part ``$h/k + k/h$'' of the formula for $f(h,
k, c)$ in (19) contributes
$${m↓2\over m↓1} + {m↓1\over m↓2} - {m↓3\over m↓2} - {m↓2\over m↓3}
+ {m↓4\over m↓3} + {m↓3\over m↓4}
- {m↓5\over m↓4} - {m↓4\over m↓5}$$
to the total, and this simplifies to
$$\halign{\hjust to size{#}\cr
\quad$\dispstyle{h\over k} + \left(a↓1 + {m↓3\over m↓2}\right) - {m↓3\over m↓2}
- \left(a↓2 + {m↓4\over m↓3}\right) + {m↓4\over m↓3}
+ \left(a↓3 + {m↓5\over m↓4}\right) - {m↓5\over m↓4} - a↓4$\hfill\cr
\noalign{\penalty1000\vskip4pt}\hfill$= h/k + a↓1 - a↓2 + a↓3 - a↓4$.\quad\cr}$$
The next part ``$1/hk$'' of (19) also leads to
a simple contribution; according to Eq.\ 4.5.3--9 and other formulas
in Section 4.5.3, we have
$$1/m↓1m↓2 - 1/m↓2m↓3 + 1/m↓3m↓4 - 1/m↓4m↓5 = h↑\prime /k -
1,\eqno (35)$$
where $h↑\prime$ is the unique integer satisfying
$$h↑\prime h ≡ 1 \modulo {k},\quad 0 < h↑\prime < k.\eqno(36)$$
Adding up all the contributions, and remembering
our assumption that $c↓4 = 0$ $\biglp$so that $e(m↓4, c↓3) = 0$, cf.\
(20)$\bigrp$, we find that
$$\eqalign{\sigma (h, k, c)⊗ = {h + h↑\prime \over k} + (a↓1 - a↓2
+ a↓3 - a↓4) - 6(b↓1 - b↓2 + b↓3 - b↓4)\cr
⊗\qquad\null+6\left({c
↓1↑2\over m↓1m↓2} - {c↓2↑2\over m↓2m↓3} + {c↓3↑2\over
m↓3m↓4} - {c↓4↑2\over m↓4m↓5}\right) + 2,\cr}$$
in terms of the assumed tableau (32). Similar results
hold in general:
\thbegin Theorem D. {\sl Let $h$, $k$, $c$ be integers with
$0 ≤ h ≤ k$, $0 < c ≤ k$, and $h$ relatively prime to $k$. Form the
``Euclidean tableau'' as defined in $(33)$ above, and assume that the
process stops after $t$ steps with $m↓{t+1} = 1$. Let $s$ be the smallest
subscript such that $c↓s = 0$, and let $h↑\prime$ be defined by
$(36)$. Then
$$\twoline{
\sigma (h, k, c) = {h + h↑\prime \over k} + \sum ↓{1≤j≤t}(-1)↑{j+1}\left(a↓j
- 6b↓j + 6 {c↑{2}↓j\over m↓j m↓{j+1}}\right)}{0pt}{\null + 3\biglp (-1)↑s + \delta
↓{s1}\bigrp - 2 + (-1)↑t.\quad\blackslug}$$}
\yyskip Euclid's algorithm is analyzed carefully
in Section 4.5.3; the quantities $a↓1$, $a↓2$, $\ldotss$, $a↓t$ are
called the {\sl partial quotients} of $h/k$. Theorem 4.5.3F
tells us that the number of iterations, $t$, will never exceed
$\log↓\phi k$; hence Dedekind sums can be evaluated rapidly. The
terms $c↑{2}↓{j}/m↓jm↓{j+1}$ can be simplified further, and
an efficient algorithm for evaluating $\sigma (h, k, c)$ appears
in exercise 17.
\yskip Now that we have analyzed generalized Dedekind
sums, let us apply our knowledge to the determination of serial
correlation coefficients.
%folio 102 galley 1 (C) Addison-Wesley 1978 *
\thbegin Example 1. {\sl Find the
serial correlation when $m = 2↑{35}$, $a = 2↑{34} + 1$, $c = 1$.}
\penalty25\vskip 6pt plus 12pt minus 4pt\noindent{\sl Solution.}\xskip
We have
$$C = \biglp 2↑{35}\sigma (2↑{34} + 1, 2↑{35}, 1) - 3 + 6(2↑{35}
- (2↑{34} - 1) - 1)\bigrp /(2↑{70} - 1)$$
by Eq.\ (17). To evaluate $\sigma (2↑{34} + 1, 2↑{35},
1)$, we can form the tableau
$$\vcenter{\halign{\rt{$#$}⊗\lft{$\;#$}\qquad⊗\rt{$ #$}⊗\lft{$\;#$}\qquad
⊗\rt{$#$}⊗\lft{$\;#$}\qquad⊗\rt{$ #$}⊗\lft{$\;#$}\cr
m↓1 ⊗= 2↑{35}⊗⊗⊗ c↓1 ⊗= 1\cr
m↓2 ⊗= 2↑{34} + 1⊗ a↓1 ⊗= 1⊗ c↓2 ⊗=1⊗ b↓1 ⊗= 0\cr
m↓3 ⊗= 2↑{34} - 1⊗ a↓2 ⊗= 1⊗ c↓3 ⊗=1⊗ b↓2 ⊗= 0\cr
m↓4 ⊗= 2⊗ a↓3 ⊗= 2↑{33} - 1⊗ c↓4 ⊗=1⊗ b↓3 ⊗= 0\cr
m↓5 ⊗= 1⊗ a↓4 ⊗= 2⊗ c↓5 ⊗= 0⊗ b↓4⊗= 1\cr}}$$
Since $h↑\prime = 2↑{34} + 1$, the value according to
Theorem D comes to $2↑{33} - 3 + 2↑{-32}$. Thus
$$\textstyle C = (2↑{68} + 5)/(2↑{70} - 1) = {1\over 4} + \epsilon ,\qquad
|\epsilon | < 2↑{-67}.\eqno (37)$$
Such a correlation is much, much too high for randomness.
Of course, this generator has very low potency, and we have
already rejected it as nonrandom.
\thbegin Example 2. {\sl Find the approximate
serial correlation when $m = 10↑{10}$, $a = 10001$, $c = 2113248653$.}
\penalty25\vskip 6pt plus 12pt minus 4pt\noindent{\sl Solution.}\xskip
We have $C \approx \sigma
(a, m, c)/m$, and the computation proceeds as follows:
$$\halign{\hjust to size{\hfill$#$\hfill}\cr
\vcenter{\halign{\rt{$#$}⊗$\;#\;$⊗\rt{#}\qquad⊗\rt{$ #$}⊗{$\;#\;$}⊗\rt{#}\qquad
⊗\rt{$#$}⊗$\;#\;$⊗\rt{#}\qquad⊗\rt{$ #$}⊗{$\;#\;$}⊗\rt{#}\cr
m↓1 ⊗=⊗10000000000⊗⊗⊗⊗ c↓1 ⊗=⊗2113248653\cr
m↓2 ⊗=⊗10001⊗a↓1 ⊗=⊗999900⊗c↓2 ⊗=⊗7350⊗b↓1 ⊗=⊗211303\cr
m↓3 ⊗=⊗100⊗a↓2⊗=⊗100⊗c↓3⊗=⊗50⊗b↓2⊗=⊗73\cr
m↓4 ⊗=⊗1⊗a↓3 ⊗=⊗100⊗c↓4⊗=⊗0⊗b↓3⊗=⊗50\cr}}\cr
\noalign{\penalty1000\vskip10pt}
\sigma (m↓2, m↓1, c↓1) = -31.6926653544;\cr
\noalign{\penalty1000\vskip4pt}
\hjust to size{\hfill$C \approx -3 \cdot 10↑{-9}.$\hfill(38)}\cr}$$
This is a very respectable value of $C$ indeed.
But the generator has a potency of only 3, {\sl so it is not
really a very good source of random numbers in spite of the
fact that it has low serial correlation.} It is necessary to
have a low serial correlation, but not sufficient.
\thbegin Example 3. {\sl Estimate the serial
correlation for general $a$, $m$, $c$.}\xskip If we consider just one application
of (30), we have
$$\sigma (a, m, c) \approx {m\over a} + 6 {c↑2\over am} - 6
{c\over a} - \sigma (m, a, c).$$
Now $|\sigma (m, a, c)| < a$ by exercise 12, and
therefore
$$C \approx {\sigma (a, m, c)\over m} \approx {1\over a} \left(1
- 6 {c\over m} + 6 \left(c\over m\right)↑2\right).\eqno (39)$$
The error in this approximation is less than $(a
+ 6)/m$ in absolute value.
The estimate in (39) was the first theoretical result known
about the randomness of congruential generators. R. R. Coveyou
[{\sl JACM \bf 7} (1960), 72--74] obtained it by averaging over
all {\sl real} numbers $x$ between 0 and $m$ instead of considering
only the integer values (cf.\ exercise 21); then Martin Greenberger
[{\sl Math.\ Comp.\ \bf 15} (1961), 383--389] gave a rigorous
derivation including an estimate of the error term.
So began one of the saddest chapters in the history of computer
science! Although the above approximation is quite correct,
it has been grievously misapplied in practice; people abandoned
the perfectly good generators they had been using and replaced
them by terrible generators which looked good from the standpoint
of (39). For more than a decade, the most common random-number
generators in daily use were seriously deficient, solely because
of a theoretical advance. A little knowledge is a dangerous
thing.
If we are to learn by past mistakes, we had better look carefully
at how (39) has been misused. In the first place people assumed
uncritically that a small serial correlation over the whole
period would be a pretty good guarantee of randomness; but in
fact it doesn't even ensure a small serial correlation for 1000
consecutive elements of the sequence (see exercise 14).
Secondly, (39) and its error term will ensure a relatively small
value of $C$ only when $a \approx \sqrt{m}$;
therefore people suggested choosing multipliers near
$\sqrt{m}$. In fact, we shall see that nearly all multipliers
give a value of $C$ that is substantially less than $1/
\sqrt{m}$, hence (39) is not a very good approximation
to the true behavior. Minimizing a crude upper bound for $C$
does not minimize $C$.
In the third place, people observed that
(39) yields its best estimate when $c/m \approx {1\over 2} \pm
{1\over 6}\sqrt{3}$, since these values are
the roots of $1 - 6x + 6x↑2 = 0$. ``In the absence of any other
criterion for choosing $c$, we might as well use this one.'' The
latter statement is not incorrect, but it is misleading at best,
since experience has shown that the value of $c$ has hardly
any influence on the true value of the serial correlation when
$a$ is a good multiplier; the choice $c/m \approx {1\over 2}
\pm {1\over 6}\sqrt{3}$ reduces $C$ substantially
only in cases like Example 2 above. And we are fooling ourselves
in such cases, since the bad multiplier will reveal its deficiencies
in other ways.
Clearly we need a better estimate than (39); and such an estimate
is now available thanks to Theorem D, which stems principally
from the work of U. Dieter [{\sl Math.\ Comp.\ \bf 25} (1971),
855--883]. Theorem D implies that $\sigma (a, m, c)$ will be
small {\sl if the partial quotients of a/m are small.} Indeed,
by analyzing generalized Dedekind sums still more closely, it
is possible to obtain quite a sharp estimate:
%folio 105 galley 2 (C) Addison-Wesley 1978 *
\thbegin Theorem K. {\sl Under the
assumptions of Theorem D, we always have}
$$\def\\#1{\sum↓{\scriptstyle1≤j≤t\atop\scriptstyle j\,#1}a↓j}
-{1\over2}\\{\char'157\char'144\char'144}-\\{\char'145\char'166\char'145\char'156}
+{1\over2}≤\sigma(h,k,c)≤\\{\char'157\char'144\char'144}+{1\over2}
\\{\char'145\char'166\char'145\char'156}-{1\over 2}.\eqno(40)$$
\proofbegin See D. E. Knuth, {\sl Acta
Arithmetica \bf33} (1978), 297--325, where it is shown further that these
bounds are essentially the best possible when there are large
partial quotients.\quad\blackslug
\thbegin Example 4. {\sl Estimate the serial
correlation for\xskip $a = 3141592621$,\xskip $m = 2↑{35}$,
$c$\penalty1000\ odd.}\xskip The partial
quotients of $a/m$ are 10, 1, 14, 1, 7, 1, 1, 1, 3, 3, 3, 5,
2, 1, 8, 7, 1, 4, 1, 2, 4, 2; hence by Theorem K
$$-45 ≤ \sigma (a, m, c) ≤ 68,$$
and the serial correlation is guaranteed to be
extremely low for all $c$.
Note that this bound is considerably better
than we could obtain from (39), since the error in (39) is of
order $a/m$; a ``random'' multiplier has turned out to be much
better than one specifically chosen to look good on the basis
of (39). In fact, it is possible to show that the {\sl average}
value of $\sum ↓{1≤j≤t}a↓j$, taken over all multipliers $a$
relatively prime to $m$, is
$${6\over π↑2} (\ln m)↑2 + O\biglp(\log m)(\log\log m)↑4\bigrp$$
(see exercise 4.5.3--35). Therefore the probability
that a random multiplier has large $\sum ↓{1≤j≤t}a↓t$, say larger
than $(\log m)↑{2+ε}$ for some fixed $\epsilon >0$, approaches
zero as $m → ∞$. This substantiates the empirical evidence that
almost all linear congruential sequences have extremely low
serial correlation over the entire period.
\yskip The exercises below show that other {\sl a priori} tests, such
as the serial test over the entire period, can also be expressed
in terms of a few generalized Dedekind sums. It follows from
Theorem K that linear congruential sequences will pass these
tests provided that certain specified fractions (depending on
$a$ and $m$ but not on $c$) have small partial quotients. In
particular, the result of exercise 19 implies that {\sl the
serial test on pairs will be satisfactorily passed if and only
if\/\ $a/m$ has no large partial quotients.}
The book {\sl Dedekind Sums} by Hans Rademacher and Emil Grosswald
(Math.\ Assoc.\ of America, Carus Monograph No.\ 16, 1972) discusses
the history and properties of Dedekind sums and their generalizations.
Further theoretical tests, including the serial test in higher dimensions,
are discussed in Section 3.3.4.
\exbegin{EXERCISES---First Set}
\exno 1. [M10] Express
``$x \mod y''$ in terms of the sawtooth and $\delta$ functions.
\exno 2. [M20] Prove the ``replicative law,'' Eq.\ (10).
\exno 3. [HM22] What is the Fourier series expansion (in terms
of sines and cosines) of the function $f(x) = ((x))$?
\trexno 4. [M19] If $m = 10↑{10}$, what is the highest possible
value of $d$ (in the notation of Theorem P), given that the
potency of the generator is 10?
\exno 5. [M21] Carry out the derivation of Eq.\ (17).
\exno 6. [M27] Let $hh↑\prime + kk↑\prime = 1$.\xskip (a) Show, without
using Lemma B, that
$$\sigma (h, k, c) = \sigma (h, k, 0) + 12\sum ↓{0<j<c}
\left(\left(h↑\prime j\over k\right)\right)
+ 6 \left(\left(h↑\prime c\over k\right)\right)$$
for all integers $c ≥ 0$.\xskip (b) Show that if $0 <
j < k$,
$$\left(\left(h↑\prime j\over k\right)\right)
+\left(\left(k↑\prime j\over h\right)\right)
={j\over hk} - {1\over 2}\delta \left(j\over h\right).$$
(c) Under the assumptions of Lemma B, prove Eq.\
(21).
\trexno 7. [M24] Give a proof of the reciprocity law (19), when
$c = 0$, by using the general reciprocity
law of exercise 1.2.4--45.
\trexno 8. [M34] (L. Carlitz.)\xskip Let
$$\rho (p, q, r) = 12\sum ↓{0≤j<r}\left(\left(jp\over r\right)\right)\left(\left(
jq\over r\right)\right).$$
By generalizing the method of proof used
in Lemma B, prove the following beautiful identity due to H.
Rademacher: If each of $p, q, r$ is relatively prime to the
other two,
$$\rho (p, q, r) + \rho (q, r, p) + \rho (r, p, q) = {p\over
qr} + {q\over rp} + {r\over pq} - 3.$$
(The reciprocity law for Dedekind sums, with $c
= 0$, is the special case $r = 1$.)
\exno 9. [M40] Is there a simple
proof of Rademacher's identity (exercise 8) along the lines
of the proof in exercise 7 of a special case?
\exno 10. [M20] Show that when $0 < h < k$ it is possible to
express $\sigma (k - h, k, c)$ and $\sigma (h, k, - c)$ easily
in terms of $\sigma (h, k, c)$.
\exno 11. [M30] The formulas given in the
text show us how to evaluate $\sigma (h, k, c)$ when $h$ and
$k$ are relatively prime and $c$ is an integer. For the general
case, prove that
\yskip\hang\textindent{a)}$\sigma (dh, dk, dc) = \sigma (h, k, c)$,\quad
integer $d > 0$;
\yskip\hang\textindent{b)}$\sigma (h, k, c + \theta ) = \sigma (h,
k, c) + 6((h↑\prime c/k))$,\quad integer $c$, real
$0 < \theta < 1$, when $h$ and $k$ are relatively prime and
$hh↑\prime ≡ 1 \modulo k$.
\exno 12. [M24] Show that
if $h$ is relatively prime to $k$ and $c$ is an integer, $|\sigma
(h, k, c)| ≤ (k - 1)(k - 2)/k$.
\exno 13. [M24] Generalize Eq.\ (26) so that it gives an expression
for $\sigma (h, k, c)$.
\trexno 14. [M20] The linear congruential
generator which has $m = 2↑{35}$, $a = 2↑{18} + 1$, $c = 1$, was
given the serial correlation test on three batches of 1000 consecutive
numbers, and the result was a very high correlation, between
0.2 and 0.3, in each case. What is the serial correlation of
this generator, taken over all $2↑{35}$ numbers of the period?
\exno 15. [M21] Generalize Lemma B so that it applies to all
{\sl real} values of $c$, $0 ≤ c < k$.
\exno 16. [M24] Given the Euclidean tableau defined in (33),
let $p↓0 = 1$, $p↓1 = a↓1$, and $p↓j = a↓jp↓{j-1} + p↓{j-2}$ for
$1 < j ≤ t$. Show that the complicated portion of the sum in
Theorem D can be rewritten as follows, making it possible to
avoid noninteger computations:
$$\sum ↓{1≤j≤t} (-1)↑{j+1}{c↑{2}↓j\over m↓jm↓{j+1}} = {1\over
m↓1} \sum ↓{1≤j≤t}(-1)↑{j+1}b↓j(c↓j + c↓{j+1})p↓{j-1}.$$
[{\sl Hint:} Prove that $\sum ↓{1≤j≤r}(-1)↑{j+1}/m↓jm↓{j+1}
= (-1)↑{r+1}p↓{r-1}/m↓1m↓{r+1}$ for $1 ≤ r ≤ t$.]
\exno 17. [M22] Design an algorithm which evaluates $\sigma
(h, k, c)$ for integers $h$, $k$, $c$ satisfying the hypotheses
of Theorem D. Your algorithm should use only integer arithmetic
(of unlimited precision), and it should produce the answer in
the form $A + B/k$ where $A$ and $B$ are integers. (Cf.\ exercise
16.) If possible, use only a finite number of variables for
temporary storage, instead of maintaining arrays such as $a↓1$,
$a↓2$, $\ldotss$, $a↓t$.
\trexno 18. [M23] (U. Dieter.)\xskip Given positive integers $h$, $k$,
$z$, let
$$S(h, k, c, z) = \sum ↓{0≤j<z}\left(\left(hj+c\over k\right)\right).$$
Show that this sum can be evaluated in
``closed form,'' in terms of generalized Dedekind sums and the
sawtooth function.\xskip [{\sl Hint:} When $z ≤ k$, the quantity $\lfloor j/k\rfloor
-\lfloor(j-z)/k\rfloor$ equals 1 for $0≤j<z$, and it equals 0 for $z≤j<k$,
so we can introduce this factor and sum over $0≤j<k$.]
%folio 113 galley 3 (C) Addison-Wesley 1978 *
\trexno 19. [M23] Show
that the {\sl serial test} can be analyzed over the full period,
in terms of generalized Dedekind sums, by finding an expression
for the probability that $α ≤ X↓n < β$ and $α↑\prime
≤X↓{n+1} < β↑\prime$ when $α$, $β$, $α↑\prime$, $β↑\prime$
are given integers with $0 ≤ α < β ≤ m$, $0 ≤ α↑\prime < β↑\prime
≤ m$.\xskip [{\sl Hint:} Consider the quantity $\lfloor (x - α)/m\rfloor
- \lfloor (x - β)/m\rfloor$.]
\exno 20. [M29] (U. Dieter.)\xskip Extend Theorem P by obtaining a
formula for the probability that $X↓n > X↓{n+1}
>X↓{n+2}$, in terms of generalized Dedekind sums.
\exbegin{EXERCISES---Second Set}
\noindent In many cases, exact computations with
integers are quite difficult to carry out, but we can attempt
to study the probabilities which arise when we take the average
over all real values of $x$ instead of restricting the calculation
to integer values. Although these results are only approximate,
they shed some light on the subject.
It is convenient to deal with numbers $U↓n$ between
zero and one; for linear congruential sequences, $U↓n = X↓n/m$,
and we have $U↓{n+1} = \{aU↓n + \theta\} $, where $\theta
= c/m$ and $\{x\}$ denotes $x \mod 1$. For example, the formula
for serial correlation now becomes
$$C = \bigglp\int↓0↑1 x\{ax+\theta\}\,dx-\bigglp\int↓0↑1 x\,dx\biggrp↑2\,\biggrp
\vcenter{\hjust{\:@\char'36}} % bigg slash
\bigglp\int↓0↑1 x↑2\,dx-\bigglp\int↓0↑1 x\,dx\biggrp↑2\,\biggrp.$$
\trexno 21. [HM23] (R. R.
Coveyou.)\xskip What is the value of $C$ in the formula just given?
\trexno 22. [M22] Let $a$ be an integer, and let $0 ≤ \theta <
1$. If $x$ is a real number between 0 and 1, and if $s(x) =
\{ax + \theta\}$, what is the probability that $s(x) <
x$? (This is the ``real number'' analog of Theorem P.)
\exno 23. [28] The previous exercise gives the probability that
$U↓{n+1} < U↓n$. What is the probability that $U↓{n+2} < U↓{n+1}
< U↓n$, assuming that $U↓n$ is a random real number between
zero and one?
\exno 24. [M29] Under the assumptions of the preceding problem,
except with $\theta = 0$, show that $U↓n > U↓{n+1} > \cdots >
U↓{n+t-1}$ occurs with probability
$${1\over t!} \left(1 + {1\over a}\right)\ldotsm\left(1 + {t - 2\over
a}\right).$$
What is the average length of a descending run
starting at $U↓n$, assuming that $U↓n$ is selected at random
between zero and one?
\trexno 25. [M25] Let $α$, $β$, $α↑\prime$,
$β↑\prime$ be real numbers with $0 ≤ α < β ≤ 1$, $0 ≤ α↑\prime
< β↑\prime ≤ 1$. Under the assumptions of exercise 22, what
is the probability that $α ≤ x < β$ and $α↑\prime ≤ s(x) < β↑\prime$?
(This is the ``real number'' analog of exercise 19.)
\exno 26. [M21] Consider a ``Fibonacci'' generator, where $U↓{n+1}
= \{U↓n + U↓{n-1}\}$. Assuming that $U↓1$ and $U↓2$ are
independently chosen at random between 0 and 1, find the probability
that $U↓1 < U↓2 < U↓3$, $U↓1 < U↓3 < U↓2$, $U↓2 < U↓1 < U↓3$, etc.\xskip
[{\sl Hint: }Divide the ``unit square,'' i.e., the points
of the plane $\leftset (x, y) \relv 0 ≤ x, y < 1\rightset$, into six parts,
depending on the relative order of $x$, $y$, and $\{x + y\}
$, and determine the area of each part.]
\exno 27. [M32] In the Fibonacci generator of the preceding
exercise, let $U↓0$ and $U↓1$ be chosen independently in the
unit square except that $U↓0 > U↓1$. Determine the probability
that $U↓1$ is the
beginning of an upward run of length $k$, so that $U↓0 > U↓1
< \cdots < U↓k > U↓{k+1}$. Compare this with the corresponding
probabilities for a random sequence.
\exno 28. [M35] According to Eq.\ 3.2.1.3--5, a
linear congruential generator with potency
2 satisfies the condition $X↓{n-1} - 2X↓n + X↓{n+1} ≡ (a - 1)c
\modulo m$. Consider a generator which
abstracts this situation: let $U↓{n+1} = \{α + 2U↓n - U↓{n-1}\}$.
As in exercise 26, divide the unit square into parts which
show the relative order of $U↓1$, $U↓2$, and $U↓3$ for each pair
$(U↓1,U↓2)$. Are there any values of
$α$ for which all six possible orders are achieved with probability
$1\over 6$, assuming that $U↓1$ and $U↓2$ are chosen at
random in the unit square?
\runningrighthead{THE SPECTRAL TEST}
section{3.3.4}
\sectionskip
\sectionbegin{3.3.4. The Spectral Test}
In this section we shall study an especially important way to check
the quality of linear congruential random-number
generators; not only do all good generators pass this
test, all generators now known to be bad actually {\sl fail}
it. Thus it is by far the most powerful test known, and it deserves
particular attention. Our discussion will also bring out some
fundamental limitations on the degree of randomness we can expect
from linear congruential sequences and their generalizations.
The spectral test embodies aspects of both the empirical and
theoretical tests studied in previous sections: it is like the
theoretical tests because it deals with
properties of the full period of the sequence, and it is like
the empirical tests because it requires a computer program to
determine the results.
\subsectionbegin{A. Ideas underlying the test} The
most important randomness
criteria seem to rely on properties of the joint distribution
of $t$ consecutive elements of the sequence, and the spectral
test deals directly with this distribution. If we have a sequence
$\langle U↓n\rangle$ of period $m$, the idea is to analyze the
set of all $m$ points
$$\left\{\,(U↓n,U↓{n+1},\ldotss,U↓{n+t-1})\,\right\}\eqno(1)$$
in $t$-dimensional space.
For simplicity we shall assume that we have a linear congruential sequence
$(X↓0,a,c,m)$ of maximum period length $m$ (so that $c≠0$), or that $m$
is prime and $c=0$ and the period length is $m-1$. In the latter case we
shall add the point $(0,0,\ldotss,0)$ to the set (1), so that there are always
$m$ points in all; this extra point has a negligible effect when $m$ is large,
and it makes the theory much simpler. Under these assumptions, (1) can be
rewritten as
$$\left.\left\{{1\over m} \biglp x, s(x), s(s(x)) , \ldotss
, s↑{t-1}(x)\bigrp\;\right|\; 0 ≤ x < m\right\},\eqno (2)$$
where
$$s(x) = (ax + c)\mod m\eqno (3)$$
is the ``successor'' of $x$. Note that we are considering
only the set of {\sl all} such points in $t$ dimensions, not
the order in which those points are actually generated. But the
order of generation is reflected in the dependence between components
of the vectors; and the spectral test studies such dependence
for various dimensions $t$ by dealing with the totality of all
points (2).
For example, Fig.\ 8 shows a typical small case in 2 and 3 dimensions,
for the generator with
$$s(x) = (137x + 187)\mod 256.\eqno (4)$$
Of course a generator with period length 256 will hardly be random, but
256 is small enough that we can draw the diagram and gain some understanding
before we turn to the larger $m$'s which are of practical interest.
\topinsert{\vskip 50mm
\hjust to 62mm{\caption Fig. 8. (a) The two-dimensional grid formed by all
pairs of successive points $(X↓n,X↓{n+1})$, when $X↓{n+1}=(137X↓n+187)\mod256$.}
\vskip 3pt
\hjust to size{\ninepoint (b) The three-dimensional grid of triplets $(X↓n,X↓{n+1},
X↓{n+2})$. [Illustrations courtesy of Bruce G. Baumgart.]}}
Perhaps the most striking thing about
the pattern of boxes in Fig.\ 8 is that we can cover them all
by a fairly small number of parallel lines; indeed, there are
many different families of parallel lines which will hit all
the points. For example, a set of 20 nearly vertical lines will
do the job, as will a set of 21 lines that tilt upward at roughly
a $30\deg$ angle. We commonly observe similar patterns when driving
past farmlands that have been planted in a systematic manner.
If the same generator is considered in three dimensions, we
obtain 256 points in a cube, obtained by appending a ``height''
component $s(s(x))$ to each of the 256 points $\biglp
x, s(x)\bigrp$ in the plane of Fig.\ 8(a), as shown in Fig.\
8(b). Let's imagine that this 3-D crystal structure has been
made into a physical model, a cube which we can turn in our
hands; as we rotate it, we will notice various families of parallel
planes which encompass all of the points. In the words of Wallace
Givens, the random numbers stay ``mainly in the planes.''
At first glance we might think that such systematic behavior
is so nonrandom as to make congruential generators quite worthless;
but more careful reflection, remembering that $m$ is quite large
in practice, provides a better insight. The regular structure
in Fig.\ 8 is essentially the ``grain'' we see when examining
our random numbers under a high-power microscope. If we take
truly random numbers between 0 and 1, and round or truncate
them to finite accuracy so that each is an integer multiple
of $1/\nu$ for some given number $\nu $, then the $t$-dimensional
points (1) we obtain will have an extremely regular character
when viewed through a microscope.
Let $1/\nu ↓2$ be the maximum distance between lines, taken
over all families of parallel straight lines that cover the
points $\left\{\biglp x/m,\, s(x)/m\bigrp\right\}$ in two dimensions. We shall call
$\nu ↓2$ the two-dimensional {\sl accuracy} of the random-number
generator, since the pairs of successive numbers have a fine
structure which is essentially good to one part in $\nu ↓2$.
Similarly, let $1/\nu ↓3$ be the maximum distance between planes,
taken over all families of parallel planes that cover all points
$\left\{\biglp x/m,\, s(x)/m,\, s(s(x))/m\bigrp\right\}$; we shall call $\nu
↓3$ the accuracy in three dimensions. The $t$-dimensional accuracy
$\nu ↓t$ is the reciprocal of the maximum distance between hyperplanes,
taken over all families of parallel $(t - 1)$-dimensional hyperplanes
that cover all points $\left\{\biglp x/m,\, s(x)/m,\, \ldotss ,\,
s↑{t-1}(x)/m\bigrp\right\}$.
The essential difference
between periodic sequences and truly random sequences that have
been truncated to multiples of $1/\nu$ is that the ``accuracy''
of truly random sequences is the same in all dimensions, while
the ``accuracy'' of periodic sequences decreases as $t$ increases.
Indeed, since there are only $m$ points in the $t$-dimensional
cube when $m$ is the period length, we can't achieve a $t$-dimensional
accuracy of more than about $m↑{1/t}$.
%folio 120 galley 4 (C) Addison-Wesley 1978 *
When the independence
of $t$ consecutive values is considered, computer-generated
random numbers will behave essentially as if we took truly random
numbers and truncated them to $\lg \nu ↓t$ bits, where $\nu ↓t$
decreases with increasing $t$. In practice, such varying accuracy
is usually all we need. We don't insist that the 10-dimensional
accuracy be $2↑{35}$, in the sense that all $(2↑{35})↑{10}$ possible
10-tuples $(U↓n, U↓{n+1}, \ldotss , U↓{n+9})$ should be equally
likely on a 35-bit machine; for such large $t$ we only want
a few of the leading bits of $(U↓n, U↓{n+1}, \ldotss , U↓{n+t-1})$
to behave as if they were independently random.
On the other hand when an application demands high resolution
of the random-number sequence, simple linear congruential sequences
will necessarily be inadequate; a generator with larger period
should be used instead, even though only a small fraction of
the period will actually be generated. Squaring the period will
essentially square the accuracy in higher dimensions, i.e.,
it will double the effective number of bits of precision.
The spectral test is based on the values of $\nu ↓t$ for small
$t$, say $2≤t≤6$. Dimensions 2, 3, and 4 seem to be adequate
to detect important deficiencies in a sequence, but since we
are considering the entire period it seems best to be somewhat
cautious and go up into another dimension or two; on the other
hand the values of $\nu ↓t$ for $t≥10$ seem to be of no practical
significance whatever. (This is fortunate, because it appears
to be rather difficult to calculate $\nu ↓t$ when $t≥10$.)
Note that there is a vague relation between the spectral test
and the serial test; for example, a special case of the serial test,
taken over the entire period as in exercise 3.3.3--19,
counts the number of boxes in each of 64 subsquares of Fig.\
8(a). The main difference is that the spectral test rotates
the dots so as to discover the least favorable orientation.
We shall return to a consideration of the serial test later in this section.
\yskip It may appear at first that we should apply the spectral test
only for one suitably high value of $t$; if a generator passes
the test in three dimensions, it seems plausible that it should
also pass the 2-D test, hence we might as well omit the latter.
The fallacy in this reasoning occurs because we apply more stringent
conditions in lower dimensions. A similar situation occurs with
the serial test: Consider a generator which (quite properly)
has almost the same number of points in each subcube of the
unit cube, when the unit cube has been divided into 64 subcubes
of size ${1\over 4} \times {1\over 4} \times {1\over 4}$;
this same generator might yield completely {\sl empty} subsquares
of the unit square, when the unit square has been divided into
64 subsquares of size ${1\over 8} \times {1\over 8}$. Since
we increase our expectations in lower dimensions, a separate
test for each dimension is required.
It is not always true that $\nu ↓t ≤ m↑{1/t}$, although this
upper bound is valid when the points form a rectangular grid.
For example, it turns out that $\nu ↓2 = \sqrt{274}
> \sqrt{256}$ in Fig.\ 8, because a nearly
hexagonal structure brings the $m$ points closer together than
would be possible in a strictly rectangular arrrangement.
In order to develop an algorithm that computes $\nu ↓t$ efficiently,
we must look more deeply at the associated mathematical theory.
Therefore a reader who is not mathematically inclined is advised
to skip to part D of this section, where the spectral test is
presented as a ``plug-in'' method accompanied by several examples.
On the other hand, we shall see that the mathematics behind
the spectral test requires only some elementary manipulations
of vectors.
\yskip Some authors have suggested using the minimum number $N↓t$ of
parallel covering lines or hyperplanes as the criterion, instead
of the maximum distance $1/\nu ↓t$ between them. However, this
number does not appear to be as important as the concept of
accuracy defined above, because it is biased by how nearly the
slope of the lines or hyperplanes matches the coordinate axes
of the cube. For example, the 20 nearly vertical lines which
cover all the points of Fig.\ 8 are actually $1/
\sqrt{328}$ units apart, and this might falsely imply an accuracy
of one part in $\sqrt{328}$, or perhaps even
of one part in 20. The true accuracy of only one part in
$\sqrt{274}$ is realized only for the larger family of
21 lines with a slope of 7/15; another family of 24 lines, with
a slope of $- 11/13$, also has a greater inter-line distance than
the 20-line family, since $1/ \sqrt{290} > 1/
\sqrt{328}$. The precise way in which families of lines
act at the boundaries of the unit hypercube does not seem to
be an especially ``clean'' or significant criterion; however,
for those people who prefer to count hyperplanes, it is possible
to compute $N↓t$ using a method quite similar to the way in
which we shall calculate $\nu ↓t$ (see exercise 16).
\subsectionbegin{\star B. Theory behind the test} In order
to analyze the basic set (2), we start with the observation
that
$${1\over m} s↑j(x) = \left(a↑jx + (1 + a + \cdots + a↑{j-1})c\over
m\right) \mod 1.\eqno (5)$$
We can get rid of the ``mod 1'' operation by extending
the set periodically, making infinitely many copies of the original
$t$-dimensional hypercube, proceeding in all directions. This
gives us the set
$$\eqalign{L⊗=\left.\left\{\left({x\over m} + k↓1, {s(x)\over m} + k↓2,
\ldotss , {s↑{t-1}(x)\over m} + k↓t\right)\;\right|\;\hjust{integer }x, k↓1,
k↓2, \ldotss , k↓t\right\}\cr\noalign{\vskip 3pt}
⊗=\left.\left\{\,V↓0 + \left({x\over m} +
k↓1, {ax\over m} + k↓2, \ldotss , {a↑{t-1}x\over m} + k↓t\right)\;\right|\;
\hjust{integer }x, k↓1, k↓2, \ldotss , k↓t\right\},\cr}$$
where
$$V↓0 = {1\over m} \left(0, c, (1 + a)c, \ldotss , (1 + a + \cdots
+a↑{t-2})c\right)\eqno (6)$$
is a constant vector. The variable $k↓1$ is redundant
in this representation of $L$, because we can change $(x, k↓1,
k↓2, \ldotss , k↓t)$ to $(x + k↓1m, 0, k↓2 - ak↓1, \ldotss , k↓t
- a↑{t-1}k↓1)$, reducing $k↓1$ to zero without loss of generality.
Therefore we obtain the comparatively simple formula
$$L = \leftset V↓0 + y↓1V↓1 + y↓2V↓2 + \cdots + y↓tV↓t \,\relv\,
\hjust{integer }y↓1,
y↓2, \ldotss , y↓t\rightset,\eqno (7)$$
where
$$\eqalignno{⊗\hfill V↓1 = {1\over m} (1, a, a↑2, \ldotss , a↑{t-1});⊗(8)\cr
\noalign{\vskip 4pt}
⊗V↓2 = (0, 1, 0, \ldotss , 0),\quad V↓3 = (0, 0, 1, \ldotss
, 0),\quad \ldotss ,\quad V↓t = (0, 0, 0, \ldotss , 1).⊗(9)\cr}$$
The points $(x↓1, x↓2, \ldotss , x↓t)$ of $L$ which
satisfy $0 ≤ x↓j < 1$ for all $j$ are precisely the $m$ points
of our original set (2).
Note that the increment $c$ appears only in $V↓0$, and the effect
of $V↓0$ is merely to shift all elements of $L$ without changing
their relative distances; hence {\sl$c$ does not affect the spectral
test in any way}, and we might as well assume that $V↓0 = (0,
0, \ldotss , 0)$ when we are calculating $\nu ↓t$. When $V↓0$
is the zero vector we have a so-called ``lattice'' of points
$$L↓0=\leftset y↓1V↓1+y↓2V↓2+\cdots+y↓tV↓t\,\relv\,\hjust{integer }y↓1,y↓2,
\ldotss,y↓t\rightset,\eqno(10)$$
and our goal is to study the distances between adjacent $(t-1)$-dimensional
hyperplanes, in families of parallel hyperplanes which cover all the
points of $L↓0$.
%folio 123 galley 5 (C) Addison-Wesley 1978 *
A family of parallel $(t-1)$-dimensional
hyperplanes can be defined by a nonzero vector
$U = (u↓1, \ldotss , u↓t)$ which is perpendicular to all of them;
and the set of points on a particular hyperplane is then
$$\leftset (x↓1, \ldotss , x↓t) \,\relv\, x↓1u↓1 + \cdots + x↓tu↓t = q\rightset,
\eqno(11)$$
where $q$ is a different constant for each hyperplane
in the family. In other words, each hyperplane is the set of
all $X$ for which the {\sl dot product} $X \cdot U$ has a given
value $q$. In our case the hyperplanes are all separated by
a fixed distance, and one of them contains $(0, 0, \ldotss , 0)$;
hence we can adjust the magnitude of $U$ so that the set of
all {\sl integer} values $q$ gives all the hyperplanes in the
family. Then the distance between neighboring hyperplanes is
the minimum distance from $(0, 0, \ldotss , 0)$ to the hyperplane
for $q = 1$, namely
$$\min↓{\char'162\char'145\char'141\char'154\,x↓1,\ldotss,x↓t}\left.\left\{
\sqrt{x↑{2}↓{1} + \cdots + x↑{2}↓{t}}\;\right|\; x↓1u↓1 + \cdots +
x↓tu↓t = 1\right\}.\eqno (12)$$
Cauchy's inequality (cf.\ exercise 1.2.3--30) tells
us that
$$(x↓1u↓1 + \cdots + x↓tu↓t)↑2 ≤ (x↑{2}↓{1} + \cdots + x↑{2}↓{t})(u↑{2}↓{1}
+ \cdots + u↑{2}↓{t}),\eqno (13)$$
hence the minimum in (12) occurs when each $x↓j
= u↓j/(u↑{2}↓{1} + \cdots + u↑{2}↓{t})$; the distance between
neighboring hyperplanes is
$$1\left/\sqrt{u↑{2}↓{1} + \cdots + u↑{2}↓{t}}\right.
= 1/\hjust{length}(U).\eqno (14)$$
In other words, the quantity $\nu ↓t$ we are trying
to calculate is precisely the length of the shortest vector
$U$ which defines a family of hyperplanes $\leftset X \cdot U = q \relv
\hjust{integer }q\rightset$ containing all the elements of $L↓0$.
Such a vector $U = (u↓1, \ldotss , u↓t)$
must be nonzero, and it must satisfy $V \cdot U =$ integer for
all $V$ in $L↓0$. In particular, since the points $(1, 0, \ldotss
, 0)$, $(0, 1, \ldotss , 0)$, $\ldotss$, $(0, 0, \ldotss , 1)$ are all
in $L↓0$, all of the $u↓j$ must be integers. Furthermore since
$V↓1$ is in $L↓0$, we must have ${1\over m} (u↓1 + au↓2 + \cdots
+ a↑{t-1}u↓t) =$ integer, i.e.,
$$u↓1 + au↓2 + \cdots + a↑{t-1}u↓t ≡ 0\quad \modulo m.\eqno(15)$$
Conversely, any nonzero integer vector $U = (u↓1,
\ldotss , u↓t)$ satisfying (15) defines a family of hyperplanes
with the required properties, since all of $L↓0$ will be covered:
$(y↓1V↓1 + \cdots + y↓tV↓t) \cdot U$ will be an integer for
all integers $y↓1$, $\ldotss$, $y↓t$. We have proved that
$$\eqalign{ \nu ↑{2}↓{t}⊗ =\!\min↓{(u↓1,\ldotss,u↓t)≠(0,\ldotss,0)}\!\!
\left.\left\{ u↑{2}↓{1}\!+ \cdots +\!u↑{2}↓{t} \;\right|\;
u↓1\!+\!au↓2\!+ \cdots +\!a↑{t-1}u↓t ≡ 0\modulo m\right\}\cr
⊗=\!\min↓{(x↓1, \ldotss , x↓t)≠(0, \ldotss , 0)}\!\left((mx↓1
\!-\!ax↓2\!-\!a↑2x↓3\!- \cdots -\!a↑{t-1}x↓t)↑2 + x↑{2}↓{2} + x↑{2}↓{3}
+ \cdots + x↑{2}↓{t}\right).\cr}\eqno(16)$$ % This (16) will be dropped down
\subsectionbegin{C. Deriving a computational method} We
have now reduced the spectral test to the problem of finding
the minimum value (16); but how on earth can we determine that
minimum value in a reasonable amount of time? A brute-force
search is out of the question, since $m$ is very large in cases
of practical interest.
It will be interesting and probably more useful if we develop
a computational method for solving an even more general problem:
{\sl Find the minimum value of the quantity
$$f(x↓1, \ldotss , x↓t) = (u↓{11}x↓1 + \cdots + u↓{t1}x↓t)↑2
+ \cdots + (u↓{1t}x↓1 + \cdots + u↓{tt}x↓t)↑2\eqno (17)$$
over all nonzero integer vectors $(x↓1, \ldotss
, x↓t)$}, given any nonsingular matrix of coefficients $U = (u↓{ij})$.
The expression (17) is called a ``positive definite quadratic
form'' in $t$ variables. Since $U$ is nonsingular, (17) cannot
be zero unless the $x↓j$ are all zero.
Let us write $U↓1$, $\ldotss$, $U↓t$ for the rows of $U$. Then (17)
may be written
$$f(x↓1, \ldotss , x↓t) = (x↓1U↓1 + \cdots + x↓tU↓t) \cdot (x↓1U↓1
+ \cdots + x↓tU↓t),\eqno (18)$$
the square of the length of the vector $x↓1U↓1
+ \cdots +x↓tU↓t$. The nonsingular matrix $U$ has an inverse,
which means that we can find uniquely determined vectors $V↓1,
\ldotss , V↓t$ such that
$$U↓i \cdot V↓j = \delta ↓{ij},\qquad 1 ≤ i, j ≤ t.\eqno (19)$$
For example, in the special form (16)
which arises in the spectral test, we have
$$\vcenter{\halign to 300pt{\rt{$#=($}⊗\rt{$#,\,$}⊗$#),$\tabskip 0pt plus 100pt
⊗\tabskip 0pt\rt{$#=\;$}⊗\rt{$#,$}⊗\rt{$\,#,$}⊗\rt{$\,#,\ldotss,$}⊗\rt{$\,#$}\cr
U↓1⊗m⊗0,0,\ldotss,0⊗V↓1⊗{1\over m}(1⊗a⊗a↑2⊗a↑{t-1}),\cr
U↓2⊗-a⊗1,0,\ldotss,0⊗V↓2⊗(0⊗1⊗0⊗0),\cr
U↓3⊗-a↑2⊗0,1,\ldotss,0⊗V↓3⊗(0⊗0⊗1⊗0),\cr
\noalign{\hjust to 300pt{\hfill\leaders\hjust to 15pt{\hfill.\hfill}\hskip 200pt
\hfill}}
U↓t⊗-a↑{t-1}⊗0,0,\ldotss,1⊗V↓t⊗(0⊗0⊗0⊗1).\cr}}\eqno(20)$$
These $V↓j$ are precisely the vectors (8), (9) which
we used to define our original lattice $L↓0$. As the reader
may well suspect, this is not a coincidence---indeed, if we
had begun with an arbitrary lattice $L↓0$, defined by any set
of linearly independent vectors $V↓1, \ldotss , V↓t$, the argument
we have used above can be generalized to show that the maximum
separation between hyperplanes in a covering family is equivalent
to minimizing (17), where the coefficients $u↓{ij}$ are defined
by (19). (See exercise 2.)
Our first step in minimizing (18) is to reduce it to a finite
problem, i.e., to show that we won't need to test infinitely
many vectors $(x↓1, \ldotss , x↓t)$ to find the minimum. This
is where the vectors $V↓1, \ldotss , V↓t$ come in handy; we have
$$x↓k = (x↓1U↓1 + \cdots + x↓tU↓t) \cdot V↓k,$$
and Cauchy's inequality tells us that
$$\biglp (x↓1U↓1 + \cdots + x↓tU↓t) \cdot V↓k\bigrp ↑2
≤ f(x↓1, \ldotss , x↓t)(V↓k \cdot V↓k).$$
Hence we have derived a useful upper bound
on each coordinate $x↓k$:
\thbegin Lemma A. {\sl Let $(x↓1, \ldotss , x↓t)$ be a nonzero
vector which minimizes $(18)$ and let $(y↓1, \ldotss , y↓t)$ be any
nonzero integer vector. Then
$$x↑{2}↓{k} ≤ (V↓k \cdot V↓k)f(y↓1, \ldotss , y↓t),\qquad\hjust{for }
1≤ k ≤ t.\eqno (21)$$
In particular, letting $y↓i = \delta ↓{ij}$ for all $i,$
$$x↑{2}↓{k} ≤ (V↓k \cdot V↓k)(U↓j \cdot U↓j),\qquad\hjust{for }
1 ≤ j, k ≤ t.\quad\blackslug\eqno (22)$$}
Lemma A reduces the problem to a finite
search, but the right-hand side of (21) is usually much too
large to make an exhaustive search feasible; we need at least
one more idea. On such occasions, an old maxim provides sound
advice: ``If you can't solve a problem as it is stated, change
it into a simpler problem that has the same answer.'' For example,
Euclid's algorithm has this form; if we don't know the gcd of
the input numbers, we change them into smaller numbers having
the same gcd. (In fact, a slightly more general approach probably
underlies the discovery of nearly all algorithms: ``If you can't
solve a problem directly, change it into one or more simpler
problems, from whose solution you can solve the original one.'')
In our case, a simpler problem is one which requires less searching
because the right-hand side of (22) is smaller. The key idea
we shall use is that it is possible to change one quadratic
form into another one which is equivalent for all practical
purposes. Let $j$ be any fixed subscript, $1 ≤ j ≤ t$; let $(q↓1,
\ldotss , q↓{j-1}, q↓{j+1}, \ldotss , q↓t)$ be any sequence of
$t - 1$ integers; and consider the following transformation
of the vectors:
$$\vcenter{\halign{\rt{$#$}⊗\lft{$\;#$}\qquad⊗\rt{$#$}⊗\lft{$\;#$}\qquad
⊗\rt{$#$}⊗\lft{$\;#$}\cr
{V↓i}↑\prime ⊗ = V↓i - q↓iV↓j,⊗x↓i↑\prime⊗= x↓i
- q↓ix↓j,⊗{U↓i}↑\prime⊗=U↓i,\qquad\hjust{for }i ≠ j;\cr
\noalign{\vskip 3pt}
{V↓j}↑\prime⊗= V↓j,⊗x↓j↑\prime⊗= x↓j,⊗{U↓j}↑\prime⊗= U↓j + \sum
↓{i≠j}q↓iU↓i.\cr}}\eqno (23)$$
It is easy to see that the new vectors ${U↓1}↑\prime$,
$\ldotss$, ${U↓t}↑\prime$ define a quadratic form $f↑\prime$
for which $f↑\prime (x↓1↑\prime, \ldotss , x↓t↑\prime)
= f(x↓1, \ldotss , x↓t)$; furthermore the basic orthogonality
condition (19) remains valid, because it is easy to check that
${U↓i}↑\prime \cdot {V↓j}↑\prime = \delta ↓{ij}$. As $(x↓1,
\ldotss , x↓t)$ runs through all nonzero integer vectors, so
does $(x↓1↑\prime, \ldotss ,x↓t↑\prime)$;
hence the new form $f↑\prime$ has the same minimum as $f$.
%folio 126 galley 6 (C) Addison-Wesley 1978 *
Our goal is to use transformation
(23), replacing $U↓i$ by ${U↓i}↑\prime$ and $V↓i$ by ${V↓i}↑\prime$
for all $i$, in order to make the right-hand side of (22) small;
and the right-hand side of (22) will be small when both $U↓j
\cdot U↓j$ and $V↓k \cdot V↓k$ are small.
Therefore it is natural
to ask the following two questions about the transformation
(23):
\yyskip
\hsize 310pt
\noindent\hjust to 38pt{\hfill a) }\hangindent 38pt
{\sl What choice of $q↓i$ makes ${V↓i}↑\prime \cdot
{V↓i}↑\prime$ as small as possible?}
\penalty1000 \vskip3pt
\noindent\hjust to 38pt{\hfill b) }\hangindent 38pt
{\sl What choice of $q↓1$, $\ldotss$, $q↓{j-1}$,
$q↓{j+1}$, $\ldotss$, $q↓t$ makes ${U↓j}↑\prime \cdot {U↓j}↑\prime$
as small as possible?}
\hsize 348pt
\yyskip It is easiest to solve these questions
first for {\sl real} values of the $q↓i$. Question (a) is quite
simple, since
$$\eqalign{
(V↓i - q↓iV↓j) \cdot (V↓i - q↓iV↓j) ⊗= V↓i \cdot
V↓i - 2q↓iV↓i \cdot V↓j + q↑{2}↓{i}V↓j \cdot V↓j\cr
\noalign{\vskip 3pt}
⊗= (V↓j \cdot V↓j)\biglp q↓i - (V↓i \cdot V↓j\;/\;V↓j
\cdot V↓j)\bigrp↑2\cr
⊗\qquad\null + V↓i \cdot V↓i - (V↓i \cdot V↓j)↑2\;/\;V↓j \cdot V↓j,\cr}$$
and the minimum occurs when
$$q↓i = V↓i \cdot V↓j\;/\;V↓j \cdot V↓j.\eqno (24)$$
Geometrically, we are asking what multiple of $V↓j$
should be subtracted from $V↓i$ so that the resulting vector
${V↓i}↑\prime$ has minimum length, and the answer is to choose
$q↓i$ so that ${V↓i}↑\prime$ is perpendicular to $V↓j$ (i.e.,
so that ${V↓i}↑\prime \cdot V↓j = 0$); the diagram
$$\vcenter{\hjust{\vjust to 100pt{}}}\eqno(25)$$
makes this plain.
Turning to question (b), we want to choose the $q↓i$ so that
$U↓j + \sum ↓{i≠j} q↓iU↓i$ has minimum length; geometrically,
we want to add some vector, in the $(t - 1)$-dimensional hyperplane
whose points are the sums of multiples of $\leftset U↓i \relv i ≠ j\rightset$,
to $U↓j$. Again the best solution is to choose things so that
${U↓j}↑\prime$ is perpendicular to the hyperplane, i.e.,
so that ${U↓j}↑\prime \cdot U↓k = 0$ for all $k ≠ j$, i.e.,
$$U↓j \cdot U↓k + \sum ↓{i≠j}q↓i(U↓i \cdot U↓k) = 0,\qquad 1
≤ k ≤ t,\qquad k ≠ j.\eqno (26)$$
\vskip -8pt\noindent
(See exercise 12 for a rigorous proof that a solution
to question (b) must satisfy these $t - 1$ equations.)
Now that we have answered questions (a) and (b), we are in a
bit of a quandary; should we choose the $q↓i$ according to (24),
so that the ${V↓i}↑\prime \cdot {V↓i}↑\prime$ are minimized,
or according to (26), so that ${U↓j}↑\prime \cdot {U↓j}↑\prime$
is minimized? Either of these alternatives makes an improvement
in the right-hand side of (22), so it is not immediately clear
which choice should get priority. Fortunately, there is a very
simple answer to this dilemma: Conditions (24) and (26) are
exactly the same! (See exercise 7.) Therefore questions (a)
and (b) have the same answer; we have a happy state of affairs
in which we can reduce the length of both the $U$'s and the
$V$'s simultaneously. (It may be worthwhile to point out that
we have just rediscovered the ``Schmidt orthogonalization process.'')
Our joy must be tempered with the realization that we have dealt
with questions (a) and (b) only for {\sl real} values of the
$q↓i$. Our application restricts us to integer values, so we
cannot make ${V↓i}↑\prime$ exactly perpendicular to $V↓j$.
The best we can do for question (a) is to let $q↓i$ be the {\sl
nearest integer} to $V↓i \cdot V↓j\;/\;V↓j \cdot V↓j$ $\biglp$cf.\
(25)$\bigrp$. It turns out that this is {\sl not} always the
best solution to question (b), in fact ${U↓j}↑\prime$
may at times be longer than $U↓j$; however, the bound (21) is
never increased, since we can remember the smallest value of
$f(y↓1, \ldotss , y↓t)$ found so far, so a choice of $q↓i$ based
only on question (a) is quite satisfactory.
If we apply transformation (23) repeatedly in such a way that
none of the vectors $V↓i$ gets longer and at least one gets
shorter, we can never get into a loop; i.e., we will never be
considering the same quadratic form again after a sequence of
nontrivial transformations of this kind. But eventually we will
get ``stuck'', in the sense that none of the transformations
(23) for $1 ≤ j ≤ t$ will be able to shorten any of the vectors
$V↓1$, $\ldotss$, $V↓t$. At that point we can revert to an exhaustive
search, using the bounds of Lemma A, which will now be quite
small in most cases. Occasionally these bounds (21) will be
poor, and another type of transformation will usually get the
algorithm unstuck again and reduce the bounds (see exercise
18). However, transformation (23) by itself has proved to be
quite adequate for the spectral test; in fact, it has proved
to be amazingly powerful when the computations are arranged
as in the algorithm discussed below.
\subsectionbegin{\star D. How to perform the spectral test}
Here now is an efficient computational procedure which follows
from our considerations. R. W. Gosper and U. Dieter have discovered
that it is possible to use the results of lower dimensions to
make the spectral test significantly faster in higher dimensions.
This refinement has been incorporated into the following algorithm,
together with a significant simplification of the two-dimensional
case.
\algbegin Algorithm S (The spectral test).
This algorithm determines the value of
$$\nu ↓t = \min\!\left.\left\{\sqrt{x↑{2}↓{1} + \cdots + x↑{2}↓{t}}\;\right|\;
x↓1 + ax↓2+\cdots+a↑{t-1}x↓t≡0\modulo m\right\}\eqno (27)$$
for $2 ≤ t ≤ T$, given $a$, $m$, and $T$, where $0
< a < m$ and $a$ is relatively prime to $m$. (The number $\nu
↓t$ measures the $t$-dimensional accuracy of random-number generators,
as discussed in the text above.) All arithmetic within this algorithm
is done on integers whose magnitudes rarely if ever exceed $m↑2$,
except in step S8; in fact, nearly all of the integer variables
will be less than $m$ in absolute value during the computation.
When $\nu ↓t$ is being calculated for $t ≥3$, the algorithm
works with two $t \times t$ matrices $U$ and $V$, whose row
vectors are denoted by $U↓i = (u↓{i1}, \ldotss , u↓{it})$ and
$V↓i = (v↓{i1}, \ldotss , v↓{it})$ for $1 ≤ i ≤ t$. These vectors
satisfy the conditions
$$\tabskip0ptplus100pt\halign to size{\ctr{$\dispstyle#$}⊗\rt{$#$}\tabskip0pt\cr
u↓{i1} + au↓{i2} + \cdots + a↑{t-1}u↓{it} ≡ 0 \modulo m,
\qquad 1 ≤ i ≤ t;⊗(28)\cr
\noalign{\penalty1000\vskip 4pt}
U↓i\cdot V↓j = \delta↓{ij},\qquad 1≤i,j≤t.⊗(29)\cr}$$
(Thus the $V↓j$ of our previous discussion have
been multiplied by $m$, to ensure that their components are
integers.) There are three other auxiliary vectors, $X = (x↓1,
\ldotss , x↓t)$, $Y = (y↓1, \ldotss , y↓t)$, and $Z = (z↓1, \ldotss
, z↓t)$. During the entire algorithm, $r$ will denote $a↑{t-1}
\mod m$ and $s$ will denote the smallest upper bound for $\nu↑{2}↓{t}$
which has been discovered so far.
\algstep S1. [Initialize.]
Set $h ← a$, $h↑\prime ← m$, $p ← 1$, $p↑\prime ← 0$, $r ← a$, $s ← 1
+ a↑2$. (The first steps of this algorithm handle the case $t
= 2$ by a special method, very much like Euclid's algorithm;
we will have
$$h - ap ≡ h↑\prime - ap↑\prime ≡ 0\modulo m\qquad\hjust{and}
\qquad hp↑\prime - h↑\prime p = \pm m\eqno (30)$$
during this phase of the calculation.)
\algstep S2. [Euclidean step.] Set $q ← \lfloor h↑\prime
/h\rfloor$, $u ← h↑\prime - qh$, $v ← p↑\prime - qp$. If $u↑2 + v↑2
< s$, set $s ← u↑2 + v↑2$, $h↑\prime ← h$, $h ← u$, $p↑\prime
← p$, $p ← v$, and repeat step S2.
\algstep S3. [Compute $\nu ↓2$.] Set $u ← u - h$, $v ←
v - p$; and if $u↑2 + v↑2 < s$, set $s ← u↑2 + v↑2$, $h↑\prime
← u$, $p↑\prime ← v$. Then output $\sqrt{s} = \nu ↓2$. (The
validity of this calculation for the two-dimensional case is
proved in exercise 5. Now we will set up the $U$ and $V$ matrices
satisfying (28) and (29), in preparation for calculations in
higher dimensions.) Set
$$U ← \left(\vcenter{\halign{\ctr{$ # $}\quad⊗\ctr{$ # $}\cr
-h⊗p\cr-h↑\prime⊗p↑\prime\cr}}\right),\qquad
V ← \pm\left(\vcenter{\halign{\ctr{$ # $}\quad⊗\ctr{$ # $}\cr
p↑\prime⊗h↑\prime\cr-p⊗-h\cr}}\right),$$
where the $-$ sign is chosen for $V$ if and
only if $p↑\prime > 0$.
\algstep S4. [Advance $t$.] If $t = T$, the algorithm
terminates. (Otherwise we want to increase $t$ by 1. At this
point $U$ and $V$ are $t \times t$ matrices satisfying (28)
and (29), and we must enlarge them by adding an appropriate
new row and column.) Set $t ← t + 1$ and $r ← (ar)\mod m$.
Set $U↓t$ to the new row $(-r, 0, 0, \ldotss , 0, 1)$ of $t$
elements, and set $u↓{it} ← 0$ for $1 ≤ i < t$. Set $V↓t$ to
the new row $(0, 0, 0, \ldotss , 0, m)$. Finally, for $1 ≤ i
< t$, set $q ←\hjust{round}(v↓{i1}r/m)$, $v↓{it} ← v↓{i1}r - qm$,
and $U↓t ← U↓t + qU↓i$. (Here ``round$(x)$'' denotes the nearest
integer to $x$, e.g., $\lfloor x + 1/2\rfloor $. We are essentially
setting $v↓{it} ← v↓{i1}r$ and immediately applying transformation
(23) with $j = t$, since the numbers $|v↓{i1}r|$ are so large
they ought to be reduced at once.) Finally set $s ←\min(s,
U↓t \cdot U↓t)$, $k← t$, and $j ← 1$. (In the following steps,
$j$ denotes the current row index for transformation (23), and
$k$ denotes the last such index where the transformation shortened
at least one of the $V↓i$.)
\def\\{\cdot}
\algstep S5. [Transform.] For $1 ≤ i ≤ t$, do the following
operations: If $i ≠ j$ and $2|V↓i \\ V↓j| > V↓j \\ V↓j$,
set $q ←\hjust{round}(V↓i \\ V↓j\;/\;V↓j \\ V↓j)$, $V↓i ← V↓i - qV↓j$,
$U↓j ← U↓j + qU↓i$, and $k ← j$. (The fact that we avoid this
transformation when $2|V↓i \\ V↓j|$ exactly equals $ V↓j \\ V↓j$
prevents the algorithm from looping endlessly; see exercise
19.)
\algstep S6. [Examine new bound.] If $k = j$ (i.e., if
the transformation in S5 has just done something useful), set
$s ← \min(s, U↓j \\ U↓j)$.
\algstep S7. [Advance $j$.] If $j = t$, set $j ←
1$; otherwise set $j ← j + 1$. Now if $j ≠ k$, return to step
S5. (If $j = k$, we have gone through $t - 1$ consecutive cycles
of no transformation, so the transformation process is stuck.)
\algstep S8. [Prepare for search.] (Now the absolute
minimum will be determined, using an exhaustive search over
all $(x↓1, \ldotss , x↓t)$ satisfying the condition (21) of Lemma
A.) Set $X ← Y ← (0, \ldotss , 0)$, set $k ← t$, and set
$$\qquad z↓j ← \left\lfloor \sqrt{\lfloor (V↓j \\ V↓j)s/m↑2\rfloor}
\right\rfloor ,\qquad\hjust{for }1 ≤ j ≤ t.\eqno (31)$$
(We will examine all $X = (x↓1, \ldotss , x↓t)$
with $|x↓j| ≤ z↓j$ for $1 ≤ j ≤ t$. In hundreds of applications
of this algorithm,
no $z↓j$ has yet turned out to be greater than 1, nor has the
exhaustive search in the following steps ever reduced $s$; however,
such phenomena are probably possible in weird cases, especially
in higher dimensions. During the exhaustive search, the vector
$Y$ will always be equal to $x↓1U↓1 + \cdots + x↓tU↓t$, so that
$f(x↓1, \ldotss , x↓t) = Y \\ Y$. Since $f(-x↓1, \ldotss ,
-x↓t) = f(x↓1, \ldotss , x↓t)$, we shall examine only vectors
whose first nonzero component is positive. The method is essentially
that of counting in steps of one, regarding $(x↓1, \ldotss ,
x↓t)$ as the digits in a balanced number system with mixed radices
$(2z↓1 + 1, \ldotss , 2z↓t + 1)$; cf.\ Section 4.1.)
\algstep S9. [Advance $x↓k$.] If $x↓k = z↓k$, go to S11.
Otherwise increase $x↓k$ by 1 and set $Y ← Y + U↓k$.
\algstep S10. [Advance $k$.] Set $k ← k + 1$. Then
if $k ≤ t$, set $x↓k ← -z↓k$, $Y ← Y - 2z↓kU↓k$, and repeat step
S10. But if $k > t$, set $s ← \min(s, Y \\ Y)$.
\algstep S11. [Decrease $k$.] Set $k ← k - 1$. If
$k ≥ 1$, return to S9. Otherwise output $\nu ↓t =
\sqrt{s}$ (the exhaustive search is completed) and return to
S4.\quad\blackslug
%folio 131 galley 7 (C) Addison-Wesley 1978 *
\yyskip In practice Algorithm S
is applied for $T = 5$ or 6, say; it usually works reasonably
well when $T = 7$ or 8, but it can be terribly slow when $T
≥ 9$ since the exhaustive search tends to make the running time
grow as $3↑T$. (If the minimum value $\nu ↓t$ occurs at many
different points, the exhaustive search will hit them all; hence
we typically find that all $z↓k = 1$ for large $t$. As remarked
in the text, the values of $\nu ↓t$ are generally irrelevant
for practical purposes when $t$ is large.)
An example will help to make Algorithm S clear. Consider the
linear congruential sequence defined by
$$m = 10↑{10},\qquad a = 3141592621,\qquad c = 1,\qquad X↓0 = 0.\eqno (32)$$
Six cycles of the Euclidean algorithm in steps
S2 and S3 suffice to prove that the minimum nonzero value of
$x↑{2}↓{1} + x↑{2}↓{2}$ with
$$x↓1 + 3141592621 x↓2 ≡ 0\modulo{10↑{10}}$$
occurs for $x↓1 = 67654$, $x↓2 = 226$; hence the
two-dimensional accuracy of this generator is
$$\nu ↓2 = \sqrt{67654↑2 + 226↑2} \approx 67654.37748.$$
Passing to three dimensions, we seek the minimum
nonzero value of $x↑{2}↓{1} + x↑{2}↓{2} + x↑{2}↓{3}$ such that
$$x↓1 + 3141592621x↓2 + 3141592621↑2x↓3 ≡ 0\modulo{10↑{10}}.\eqno(33)$$
Step S4 sets up the matrices
$$U=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-67654⊗-226⊗0\cr-44190611⊗191⊗0\cr5793866⊗33⊗1\cr}}\right),\quad
V=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-191⊗-44190611⊗2564918569\cr-226⊗67654⊗1307181134\cr0⊗0⊗10000000000\cr}}\right).$$
The first iteration of step S5, with $q = 1$ for
$i = 2$ and $q = 4$ for $i = 3$, changes them to
$$U=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-21082801⊗97⊗4\cr-44190611⊗191⊗0\cr5793866⊗33⊗1\cr}}\right),\quad
V=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-191⊗-44190611⊗2564918569\cr-35⊗44258265⊗-1257737435\cr764⊗176762444⊗-259674276\cr
}}\right).$$
(Note that the first row $U↓1$ has actually gotten
longer in this transformation, although eventually the rows
of $U$ should get shorter.)
The next fourteen iterations of step S5 have
\def\\{\char'403} % asterisk centered on axis
$(j, q↓1, q↓2, q↓3) = (2, -2, \\, 0)$, $(3, 0, 3,
\\)$, $(1, \\, -10, -1)$, $(2, -1, \\, -6)$, $(3, -1, 0, \\)$, $(1,
\\, 0, 2)$, $(2, 0, \\, -1)$, $(3, 3, 4, \\)$, $(1, \\, 0, 0)$, $(2,
-5, \\, 0)$, $(3, 1, 0, \\)$, $(1, \\, -3, -1)$, $(2, 0, \\, 0)$, $(3,
0, 0, \\)$. Now the transformation process is stuck, but
the rows of the matrices have become significantly shorter:
$$U=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-1479⊗616⊗-2777\cr-3022⊗104⊗918\cr-227⊗-983⊗-130\cr}}\right),\quad
V=\left(\vcenter{\halign{\rt{$#$}\hskip5pt⊗\rt{$#$}\hskip5pt⊗\rt{$#$}\cr
-888874⊗601246⊗-2994234\cr-2809871⊗438109⊗1593689\cr-854296⊗-9749816⊗-1707736\cr
}}\right).\eqno(34)$$
The search limits $(z↓1, z↓2, z↓3)$ in step S8
turn out to be $(0, 0, 1)$, so $U↓3$ is the shortest solution
to (33); we have
$$\nu ↓3 = \sqrt{227↑2 + 983↑2 + 130↑2} \approx 1017.21089.$$
Note that only a few iterations were needed to
find this value, although condition (33) looks quite difficult
to deal with at first glance. All points $(U↓n, U↓{n+1}, U↓{n+2})$
produced by this random-number generator lie on a family of
parallel planes about 0.001 units apart.
\subsectionbegin{E. Ratings for various generators}
So far we haven't really given a criterion that tells us whether
or not a particular random-number generator ``passes'' or ``fails''
the spectral test. In fact, this depends on the application,
since some applications demand higher resolution than others.
It appears that $\nu ↓t ≥ 2↑{30/t}$ for $2 ≤ t ≤ 6$ will be
quite adequate in most applications (although the author must
admit choosing this criterion partly because 30 is conveniently
divisible by 2, 3, 5, and 6).
For some purposes we would like a criterion that is relatively
independent of $m$, so we can say that a particular multiplier
is good or bad with respect to the set of all other multipliers
for the given $m$, without examining any others. A reasonable
figure of merit for rating the goodness of a particular multiplier
seems to be the volume of the ellipsoid in $t$-space defined
by the relation $(x↓1m - x↓2a - \cdots - x↓ta↑{t-1})↑2 +
x↑{2}↓{2} + \cdots + x↑{2}↓{t} ≤ \nu ↑{2}↓{t}$, since this
volume tends to indicate how likely it is that nonzero {\sl
integer} points $(x↓1, \ldotss , x↓t)$---corresponding to solutions
of (15)---are in the ellipsoid. We therefore propose to calculate
this volume, namely
$$\mu ↓t = {π↑{t/2}\nu ↑{t}↓t\over(t/2)!m},\eqno (35)$$
as an indication of the effectiveness of the multiplier
$a$ for the given $m$. In this formula,
$$\left(t\over 2\right)! = \left(t\over 2\right)\left({t\over 2} -
1\right)\cdots \left(1\over 2\right)\sqrt π,\qquad\hjust{for $t$ odd.}\eqno
(36)$$
Thus, in six or fewer dimensions the merit is computed
as follows:
$$\vcenter{\halign{\ctr{$#$}\cr
\mu ↓2 = π\nu ↑{2}↓{2}/m,\qquad \mu ↓3 = {4\over 3}π\nu ↑{3}↓{3}/m,\qquad
\mu ↓4 = {1\over 2}π↑2\nu ↑{4}↓{4}/m,\cr
\noalign{\vskip 4pt} \mu ↓5 = {8\over 15}π↑2\nu
↑{5}↓{5}/m,\qquad \mu ↓6 = {1\over 6}π↑3\nu ↑{6}↓{6}/m.\cr}}\eqno (37)$$
We might say that the multiplier $a$ passes the
spectral test if $\mu ↓t$ is 0.1 or more for $2 ≤ t ≤ 6$, and it ``passes
with flying colors'' if $\mu ↓t ≥ 1$ for all these $t$. A low
value of $\mu ↓t$ means that we have probably picked a very
unfortunate multiplier, since very few lattices will have integer
points so close to the origin. Conversely, a high value of $\mu
↓t$ means that we have found an unusually good multiplier for
the given $m$; but it does not mean that the random numbers
are necessarily very good, since $m$ might be too small. Only
the values $\nu ↓t$ truly indicate the degree of randomness.
\topinsert{\vskip 360pt} % Table 1. Important: This must come on even-numbered page!
\topinsert{\vskip 360pt} % The other half of Table 1, will be on facing page.
Table 1 shows what sort of values occur in typical sequences.
Each line of the table considers a particular generator, and
lists $\nu ↓t$, $\mu ↓t$, and the ``number of bits of accuracy''
$\lg \nu ↓t$. Lines 1 through 4 show the generators which were
the subject of Figs.\ 2 and 5 in Section 3.3.1. The generators
in lines 1 and 2 suffer from too small a multiplier; a diagram
like Fig.\ 8 will have $a$ nearly vertical ``stripes'' when $a$
is small. The terrible generator in line 3 has a good $\mu ↓2$
but very poor $\mu ↓3$ and $\mu ↓4$; like nearly all generators
of potency 2, it has $\nu ↓3 = \sqrt{6}$ and
$\nu ↓4 = 2$ (see exercise 3). Line 4 shows a ``random'' multiplier;
this generator has satisfactorily passed numerous empirical
tests for randomness, but it does not have especially high values
of $\mu ↓2, \ldotss , \mu ↓6$. In fact, the value of $\mu ↓5$
flunks our criterion.
Line 5 shows the generator of Fig.\ 8. It passes the spectral
test with very high-flying colors, when $\mu ↓2$ through $\mu
↓6$ are considered, but of course $m$ is so small that the numbers
can hardly be called random; the $\nu ↓t$ values are terribly
low.
Line 6 is the generator discussed above; line 7 is a similar
example, having an abnormally low value of $\mu ↓3$. Line 8
shows a nonrandom multiplier for the same modulus $m$; in terms
of the ``golden ratio'' $\phi = (1 + \sqrt{5})/2$,
this multiplier is $m/\phi$ rounded to the nearest integer congruent
to 1 modulo 20. Such multipliers have been suggested by U. Dieter
and J. Ahrens because the first half of their partial quotients
will be 1, hence the Dedekind sums are likely to be especially
small (cf.\ Sections 3.3.3 and 4.5.3). The generator in line
9 shows another multiplier chosen with malice aforethought,
following a suggestion by A. G. Waterman which guarantees a
reasonably high value of $\mu ↓2$ (see exercise 11).
Lines 10 through 21 of Table 1 show further examples with $m
= 2↑{35}$, beginning with some random multipliers. The generators
in lines 12 and 13 are reminders of the good old days---they
were once used extensively since O. Taussky first suggested
them in the early 1950s. Lines 14 through 18 show various multipliers
of maximum potency having only four 1's in their binary representation.
The point of having few 1's is to replace multiplication by
a few additions, but only line 16 comes near to being passable.
Since these multipliers satisfy $(a - 5)↑3 \mod 2↑{35} = 0$,
all five of them achieve $\nu ↓4$ at the same point
$(x↓1, x↓2, x↓3, x↓4) = (-125, 75, -15, 1)$. Another curiosity
is the high value of $\mu ↓3$ following a very low $\mu ↓2$
in line 18 (see exercise 8). Lines 19 and 20 are respectively
the Dieter--Ahrens and Waterman multipliers for modulus $2↑{35}$;
and line 21 was found by M. Lavaux and F. Janssens, in a computer
search for spectrally good multipliers having a very high $\mu
↓2$.
Lines 22 through 28 apply to System/370
and other machines with 32-bit words; in this case the comparatively
small word size calls for comparatively greater care. Line 22
is, regrettably, the generator which has actually been used
on such machines in most of the world's scientific computing
centers for about a decade; its very name {\tt RANDU} is enough to
bring dismay into the eyes and stomachs of many computer scientists!
The actual generator is defined by
$$X↓0\hjust{ odd,}\qquad X↓{n+1} = (65539 X↓n)\mod 2↑{31},\eqno (38)$$
and exercise 20 indicates that $2↑{29}$ is the appropriate
modulus for the spectral test. Since $9X↓n + 6X↓{n+2} + X↓{n+2}
≡ 0 \modulo{2↑{31}}$, the generator fails most three-dimensional
criteria for randomness, and it should never have been used.
Almost any multiplier $≡ 5 \modulo 8$ would be better. (A curious
fact about {\tt RANDU}, noticed by R. W. Gosper, is that $\nu ↓4 =
\nu ↓5 = \nu ↓6 = \nu ↓7 = \nu ↓8 = \nu ↓9 =
\sqrt{116}$, hence $\mu ↓9$ is a spectacular 11.98.) Lines 23
and 24 are the Dieter--Ahrens and Waterman multipliers for modulus
$2↑{32}$, and lines 26 and 29 were found by Lavaux and Janssens.
Line 25 was nominated by George Marsaglia as ``a candidate for
the best of all multipliers,'' after a computer search in dimensions
2 through 5, partly because it is easy to remember. Line 27
uses a random primitive root, modulo the prime $2↑{31} - 1$, as
multiplier. Line 28 is for the sequence
$$X↓n = (271828183 X↓{n-1} - 314159269 X↓{n-2})\mod(2↑{31}
- 1),\eqno (39)$$
which can be shown to have period length $(2↑{31}
- 1)↑2 - 1$; it has been analyzed with the generalized spectral
test of exercise 24.
% New material not on galleys (C) Addison-Wesley 1978 *
Theoretical upper bounds on
$\mu ↓t$, which can never be transcended for any $m$, are shown just
below Table 1; it is known
that every lattice with $m$ points per unit volume has
$$\nu ↓t ≤ \gamma ↓t\,m↑{1/t},\eqno (40)$$
where $\gamma ↓t$ takes the respective values
$$(4/3)↑{1/4},\quad2↑{1/6},\quad2↑{1/4},\quad2↑{3/10},\quad(64/3)↑{1/12},
\quad2↑{3/7},\quad2↑{1/2}\eqno(41)$$
for $t = 2$, $\ldotss$, 8. (See exercise
9 and J. W. S. Cassels, {\sl Introduction to the Geometry of
Numbers} (Berlin: Springer, 1959), p.\ 332.) These bounds hold
for lattices generated by vectors with arbitrary real coordinates.
For example, the
optimum lattice for $t = 2$ is hexagonal, and it is generated by
vectors of length $2/\sqrt{3m}$ which form
two sides of an equilateral triangle. In three dimensions the
optimum lattice is generated by vectors $V↓1$, $V↓2$, $V↓3$ which
can be rotated into the form $(v, v, -v)$, $(v, -v, v)$, $(-v, v,
v)$, where $v = 1/\raise 5pt\hjust to 0pt{\hskip 2pt minus 20pt
\:f3\hskip 0pt minus 10000pt}\sqrt{4m}$. % cube root
\subsectionbegin{\star F. Relation to the serial test} In a series of
important papers published during the 1970s, Harald Neiderreiter has shown how
to analyze the distribution of the $t$-dimensional vectors (1)
by means of exponential sums.
One of the main consequences of his theory is that a generator which passes
the spectral test will also pass the serial test in several dimensions, even
when we consider only a sufficiently large part of the period instead of the
whole period. We shall now turn briefly to a study of his interesting methods,
in the case of linear congruential sequences $(X↓0,a,c,m)$ of period
length $m$.
The first idea we need is the notion of {\sl discrepancy} in $t$ dimensions,
a quantity which we shall define as the difference between the expected number and
the actual number of $t$-dimensional vectors $(x↓n,x↓{n+1},\ldotss,x↓{n+t-1})$
falling into a hyper-rectangular region, maximized over all such regions. To
be precise, let $\langle x↓n\rangle$ be a sequence of integers in the range
$0≤x↓n<m$. We define
$$D↓N↑{(t)}=\max↓R\left|{\hjust{number of $(x↓n,\ldotss,x↓{n+t-1})$ in $R$ for
$0≤n<N$}\over N}-{\hjust{volume of $R$}\over m↑t}\right|\eqno(42)$$
where $R$ ranges over all sets of points of the form
$$R=\leftset(y↓1,\ldotss,y↓t)\relv α↓1≤y↓1<β↓1,\ldotss,α↓t≤y↓t<β↓t\rightset;
\eqno(43)$$
here $α↓j$ and $β↓j$ are integers in the range $0≤α↓j<β↓j≤m$, for $1≤j≤t$. The
volume of $R$ is clearly $(β↓1-α↓1)\ldotsm(β↓t-α↓t)$. To get the discrepancy
$D↓N↑{(t)}$, we imagine looking at all these sets $R$ and finding the one with
the greatest excess or deficiency of points $(x↓n,\ldotss,x↓{n+t-1})$.
\def\uvec{u↓1,\ldotss,u↓t}
An upper bound for the discrepancy can be found by using exponential sums. Let
$\omega=e↑{2πi/m}$ be a primitive $m$th root of unity. If $(x↓1,\ldotss,x↓t)$ and
$(y↓1,\ldotss,y↓t)$ are two vectors with all components in the range $0≤x↓j,y↓j
<m$, we have
$$\sum↓{0≤\uvec<m}\omega↑{(x↓1-y↓1)u↓1+\cdots+(x↓t-y↓t)u↓t}=
\left\{\vcenter{\halign{\lft{$#$,}⊗\quad if $(x↓1,\ldotss,x↓t)#(y↓1,\ldotss,y↓t)
$⊗\lft{#}\cr
m↑t⊗=⊗;\cr
0⊗≠⊗.\cr}}\right.$$
Therefore the number of vectors $(x↓n,\ldotss,x↓{n+t-1})$ in $R$ for $0≤n<N$,
when $R$ is defined by (43), can be expressed as
\def\\{\omega↑{x↓nu↓1+\cdots+x↓{n+t-1}u↓t}}
\def\¬{\sum↓{α↓1≤y↓1<β↓1}\?\?\ldots\?\?\sum↓{α↓t≤y↓t<β↓t}\omega↑{-(y↓1u↓1+
\cdots+y↓tu↓t)}}
\def\≡{\sum↓{\scriptstyle 0≤\uvec<m \atop
\scriptstyle(\uvec)≠(0,\ldotss,0)}}
$${1\over m↑t}\sum↓{0≤n<N}\;\sum↓{0≤\uvec<m}\\\¬.$$
When $u↓1=\cdots=u↓t$ in this sum, we get $N/m↑t$ times the volume of $R$; hence
we can express $D↓N↑{(t)}$ as
$$\twoline{\max↓R\left|\chop{9pt}{{1\over Nm↑t}\sum↓{0≤n<N}\≡
\\}\right.}{3pt}{\null\times\left.\chop{9pt}{\¬}\right|.}$$
Since complex numbers satisfy $|w+z|≤|w|+|z|$ and $|wz|=|w|\,|z|$, it follows that
$$\halign to size{\rt{$\dispstyle# $}\tabskip 0pt plus 1pt minus 1pt⊗$\;#\;
$⊗$\dispstyle#\cr
D↓N↑{(t)}⊗≤⊗\max↓R{1\over m↑t}\?\≡\?
\left|\chop{9pt}{\¬}\right|\,g(\uvec)$\cr
⊗≤⊗{1\over m↑t}\?\≡\?\max↓R\left|\chop{9pt}{\¬}\right|\,g(\uvec)$\cr
⊗=⊗\≡f(\uvec)\,g(\uvec)$,\hfill(44)\cr}$$
where
$$\eqalign{g(\uvec)⊗=\left|\chop{9pt}{{1\over N}\sum↓{0≤n<N}\\}\right|;\cr
\noalign{\vskip 7pt}
f(\uvec)⊗=\max↓R{1\over m↑t}\left|\chop{9pt}{\¬}\right|\cr
\noalign{\vskip 2pt}
⊗=\max↓R\left|\chop{9pt}{{1\over m}\sum↓{α↓1≤y↓1<β↓1}\omega↑{-u↓1y↓1}}\right|\ldotsm
\left|\chop{9pt}{{1\over m}\sum↓{α↓t≤y↓t<β↓t}\omega↑{-u↓ty↓t}}\right|.\cr}$$
Both $f$ and $g$ can be further simplified in order to get a good upper bound on
$D↓N↑{(t)}$. We have
$$\left|\chop{9pt}{{1\over m}\sum↓{α≤y<β}\omega↑{-uy}}\right|=
\left|{1\over m}{\omega↑{-βu}-\omega↑{-αu}\over\omega↑{-u}-1}\right|
≤{2\over m|\omega↑u-1|}={1\over m\sin(πu/m)}$$
when $u≠0$, and the sum is $≤1$ when $u=0$, hence
$$f(\uvec)≤r(\uvec)\eqno(45)$$
where
$$r(\uvec)=\prod↓{\scriptstyle 1≤k≤t\atop\scriptstyle u↓k≠0}{1\over m\sin(πu↓k/m)}.
\eqno(46)$$
Furthermore, when $\langle x↓n\rangle$ is generated modulo $m$ by a linear
conguential sequence, we have
$$\eqalign{x↓nu↓1+\cdots+x↓{n+t-1}u↓t⊗=x↓nu↓1+(ax↓n+c)u↓2+\cdots\cr
⊗\hskip 60pt \null+\biglp a↑{t-1}x↓n+
c(a↑{t-2}+\cdots+1)\bigrp u↓t\cr
\noalign{\vskip 3pt}
⊗=(u↓1+au↓2+\cdots+a↑{t-1}u↓t)x↓n+h(\uvec)\cr}$$
where $h(\uvec)$ is independent of $n$, hence
$$g(\uvec)=\left|\chop{9pt}{{1\over N}\sum↓{0≤n<N}\omega↑{q(\uvec)x↓n}}\right|
\eqno(47)$$
where
$$q(\uvec)=u↓1+au↓2+\cdots+a↑{t-1}u↓t.\eqno(48)$$
Now {\sl here is where the connection to the spectral test comes in\/}: We will
show that the sum $g(\uvec)$ is rather small unless $q(\uvec)≡0\modulo m$, i.e.,
unless $(\uvec)$ is a solution to (15). Furthermore exercise 27 shows that
$r(\uvec)$ is rather small when $(\uvec)$ is a ``large'' solution to (15). Hence
the discrepancy $D↓N↑{(t)}$ will be rather small when (15) has only ``large''
solutions, i.e., when the spectral test is passed. All that remains is to
quantify these qualitative statements by making careful calculations.
In the first place, let's consider the size of $g(\uvec)$. When $N=m$, so that
the sum (47) is over an entire period, we have $g(\uvec)=0$ except when $(\uvec)$
satisfies (15), so the discrepancy is bounded above in this case by the sum of
$r(\uvec)$ taken over all the nonzero solutions of (15). But let's consider also
what happens in a sum like (47) when $N$ is less than $m$ and $q(\uvec)$ is not
a multiple of $m$. We have
$$\eqalignno{{1\over N}\sum↓{0≤n<N}\omega↑{x↓n}⊗={1\over N}\sum↓{0≤n<N}{1\over m}
\sum↓{0≤k<m}\omega↑{-nk}\sum↓{0≤j<m}\omega↑{x↓j+jk}\cr
⊗={1\over N}\sum↓{0≤k<m}\bigglp{1\over m}\sum↓{0≤n<N}\omega↑{-nk}\biggrp S↓{k0}
⊗(49)\cr}$$
where
$$S↓{kl}=\sum↓{0≤j<m}\omega↑{x↓{j+l}+jk}.\eqno(50)$$
Now $S↓{kl}=\omega↑{-lk}S↓{k0}$, so $|S↓{kl}|=|S↓{k0}|$ for all $l$, and we can
calculate this common value by further exponential-summery:
$$\eqalign{|S↓{k0}|↑2⊗={1\over m}\sum↓{0≤l<m}|S↓{kl}|↑2\cr
⊗={1\over m}
\sum↓{0≤l<m}\,\sum↓{0≤j<m}\omega↑{x↓{j+l}+jk}\sum↓{0≤i<m}\omega↑{-x↓{i+l}-ik}\cr
⊗={1\over m}\sum↓{0≤i,j<m}\omega↑{(j-i)k}\sum↓{0≤l<m}\omega↑{x↓{j+l}-x↓{i+l}}\cr
⊗={1\over m}\sum↓{0≤i<m}\,\sum↓{i≤j<m+i}\omega↑{(j-i)k}\sum↓{0≤l<m}\omega↑
{(a↑{j-i}-1)x↓{i+l}+(a↑{j-i}-1)c/(a-1)}.\cr}$$
Let $s$ be minimum such that $a↑s≡1\modulo m$, and let $$s↑\prime=(a↑s-1)c/
(a-1)\mod m.$$ Then $s$ is a divisor of $m$, and $x↓{n+js}≡x↓n+js↑\prime\modulo m$.
The sum on $l$ vanishes unless $j-i$ is a multiple of $s$, so we find that
$$|S↓{k0}|↑2=m\sum↓{0≤j<m/s}\omega↑{jsk+js↑\prime}.$$
We have $s↑\prime=q↑\prime s$ where $q↑\prime$ is relatively prime to $m$
(cf.\ exercise 3.2.1-21), so it turns out that
$$|S↓{k0}|=\left\{\vcenter{\halign{\lft{$#,$}\qquad⊗if $k+q↑\prime#0\modulo{m/s}
$⊗\lft{#}\cr
0⊗\neqv⊗;\cr m/\sqrt s⊗≡⊗.\cr}}\right.\eqno(51)$$
Putting this information back into (49), and recalling the derivation of (45),
shows that
$$\left|\chop{9pt}{{1\over N}\sum↓{0≤n<N}\omega↑{x↓n}}\right|≤{m\over N\sqrt s}
\sum↓k r(k),\eqno(52)$$
where the sum is over $0<k<m$ such that $k+q↑\prime≡0\modulo{m/s}
$. Exercise 25 now can be used to estimate the remaining sum, and we find that
$$\left|\chop{9pt}{{1\over N}\sum↓{0≤n<N}\omega↑{x↓n}}\right|≤{2\over π}
{\sqrt s \over N}\ln s + O\left(m \over N\sqrt s\right).\eqno(53)$$
The same bound can be used to estimate $|N↑{-1}\sum↓{0≤n<N}\omega↑{qx↓n}|$ for
any $q\neqv 0\modulo m$, since the effect is to replace $m$ in this derivation
by a divisor of $m$. In fact, the upper bound gets even smaller when $q$ has
a factor in common with $m$, since $s$ and $m/\sqrt s$ generally become
smaller. (See exercise 26.)
We have now proved that the $g(\uvec)$ part of our upper bound (44) on the
discrepancy is small, if $N$ is large enough and if $(\uvec)$ does not
satisfy the spectral test congruence (15). Exercise 27 proves that the
$f(\uvec)$ part of our upper bound is small, when summed over all the
nonzero vectors $(\uvec)$ satisfying (15), provided that all such vectors are
far away from $(0,\ldotss,0)$. Putting these results together leads to the following
theorem of Niederreiter:
\thbegin Theorem N. {\sl Let $\langle X↓n\rangle$ be a linear congruential
sequence $(X↓0,a,c,m)$ of period length $m$, and let $s$ be the least positive
integer such that $a↑s≡1\modulo m$. Let $\nu↓t$ be the $t$-dimensional accuracy
of $\langle X↓n\rangle$, as determined by the spectral test. Then the
$t$-dimensional discrepancy $D↓N↑{(t)}$ determined by the first $N$ values of
$\langle X↓n \rangle$, as defined in $(42)$, satisfies}
$$\eqalignno{D↓N↑{(t)}⊗=O\left(\sqrt s\log s\,(\log m)↑t\over N\right)+
O\left(m(\log m)↑t\over N\sqrt s\right)+O\left((\log m)↑t\over \nu↓t\right);
⊗(54)\cr
\noalign{\vskip 3pt}
D↓m↑{(t)}⊗=O\left((\log m)↑t\over\nu↓t\right).⊗(55)\cr}$$
\proofbegin The first two $O$ terms in (54) come from vectors $(\uvec)$ in (44)
that do not satisfy (15), since exercise 25 proves that $f(\uvec)$ summed over
{\sl all} $(\uvec)$ is $O\left(((2/π)\ln m)↑t\right)$ and exercise 26 bounds
each $g(\uvec)$. $\biglp$These terms are missing from (55) since $g(\uvec)=0$ in
that case.$\bigrp$ The remaining $O$ term in (54) and (55) comes from nonzero
vectors $(\uvec)$ that do satisfy (15), using the bound derived in exercise 27.
(By examining this proof carefully it is possible to replace each
$O$ in these formulas by an explicit function of $t$.)\quad\blackslug
\yyskip Eq.\ (55) relates
to the serial test in $t$ dimensions over the entire period,
while Eq.\ (54) gives us information about the distribution of the first $N$
generated values when $N$ is less than $m$. Note that (54) will guarantee
small discrepancy only when $s$ is sufficiently large, otherwise the $m/\sqrt s$
term will dominate. If $m=p↓1↑{e↓1}\ldotss p↓r↑{e↓r}$ and $\gcd(a-1,m)=p↓1↑{f↓1}
\ldotss p↓r↑{f↓r}$, then $s$ equals $p↓1↑{e↓1-f↓1}\ldotss p↓r↑{e↓r-f↓r}$ (cf.\
Lemma 3.2.1.2P); thus, the largest values of $s$ correspond to high potency.
In the common case $m=2↑e$ and $a≡5\modulo 8$, we have $s={1\over4}m$, so
$D↓N↑{(t)}$ is $O\biglp\sqrt m(\log m)↑{t+1}/N\bigrp+O\biglp(\log m)↑t/\nu↓t\bigrp$;
Eq.\ (54) says that the discrepancy will be low in $t$ dimensions if the spectral
test is passed and $N$ is somewhat larger than $\sqrt m(\log m)↑{t+1}$. (The
methods we have used appear to give useful information only when $N$ is large
compared to $\sqrt m$.)
%folio 135 galley 8 (C) Addison-Wesley 1978 *
\subsectionbegin{G. Historical remarks} In 1959,
while deriving upper bounds for the error in the evaluation
of $t$-dimensional integrals by the Monte Carlo method, N. M.
Korobov devised a way to rate the multiplier of a linear congruential
sequence. His formula (which is rather complicated) is related
to the spectral test since it is strongly influenced by ``small''
solutions to (15); but it is not quite the same. Korobov's
test has been the subject of an extensive literature, surveyed
by Kuipers and Niederreiter in {\sl Uniform Distribution of
Sequences} (New York: Wiley, 1974), $\char'570 2.5$.
The spectral test was originally formulated by R. R. Coveyou and
R. D. MacPherson [{\sl JACM} {\bf 14} (1967), 100--119],
who introduced it in an interesting indirect way.
Instead of working with the grid structure
of successive points, they considered random-number generators
as sources of $t$-dimensional ``waves.'' The numbers
$\sqrt{x↑{2}↓{1} + \cdots + x↑{2}↓{t}}$ such that
$x↓1+ \cdots + a↑{t-1}x↓t ≡ 0 \modulo m$ in their original
treatment were the wave ``frequencies,''
or points in the ``spectrum'' defined by the random-number generator,
with low-frequency waves being the most damaging to randomness;
hence the name {\sl spectral test.} Coveyou and MacPherson introduced
a procedure analogous to Algorithm S for performing their test,
based on the principle of Lemma A. However, the original procedure
(which used matrices $UU↑T$ and $VV↑T$ instead of $U$ and $V$)
dealt with extremely large numbers; the idea of working directly
with $U$ and $V$ was independently suggested by F. Janssens
and by U. Dieter.
Several other authors pointed out that the spectral test could
be understood in far more concrete terms; by introducing the
study of the grid and lattice structures corresponding to linear
congruential sequences, the fundamental limitations on randomness
became graphically clear. See G. Marsaglia, {\sl Proc.\ Nat.\
Acad.\ Sci.\ \bf 61} (1968), 25--28; W. W. Wood, {\sl J. Chem.\
Phys.\ \bf 48} (1968), 427; R. R. Coveyou, {\sl Studies in Applied
Math.\ \bf 3} (1969), 70--112; W. A. Beyer, R. B. Roof, and D.
Williamson, {\sl Math.\ Comp.\ \bf 25} (1971), 345--360; G. Marsaglia
and W. A. Beyer, {\sl Applications of Number Theory to Numerical
Analysis}, ed.\ by S. K. Zaremba (New York: Academic Press, 1972),
249--285, 361--370.
Harald Niederreiter's papers concerning the use of exponential sums to study
the distribution of linear congruential sequences have appeared in {\sl Math.\
Comp.\ \bf 26} (1972), 793--795; {\bf28} (1974), 1117--1132; {\bf30} (1976),
571--597; {\sl Advances in Math.\ \bf26} (1977), 99-181 [this is the most
important paper of the series]; and
{\sl Bull.\ Amer.\ Math.\ Soc.\ \bf84} (1978), 273--274.
%folio 137 galley 9 (C) Addison-Wesley 1978 *
\exbegin{EXERCISES}
\exno 1. [M10] To what
does the spectral test reduce in {\sl one} dimension? (In other words, what
happens when $t=1$?)
\exno 2. [HM20] Let $V↓1$, $\ldotss$, $V↓t$ be linearly independent
vectors in $t$-space, let $L↓0$ be the lattice of points defined
by (10), and let $U↓1$, $\ldotss$, $U↓t$ be defined by (19). Prove
that the maximum distance between $(t - 1)$-dimensional hyperplanes,
over all families of parallel hyperplanes that cover $L↓0$,
is $1/\min\leftset f(x↓1, \ldotss , x↓t) \relv (x↓1, \ldotss , x↓t) ≠ (0,
\ldotss , 0)\rightset$, where $f$ is defined in (17).
\exno 3. [M24] Determine $\nu ↓3$
and $\nu ↓4$ for all linear congruential generators of potency 2 and
period length $m$.
\trexno 4. [M23] Let $u↓{11}$, $u↓{12}$, $u↓{21}$, $u↓{22}$ be elements
of a $2 \times 2$ integer matrix such that $u↓{11} + au↓{12} ≡
u↓{21} + au↓{22} ≡ 0 \modulo m$ and $u↓{11}u↓{22} - u↓{21}u↓{12}
= m$.\xskip (a) Prove that all integer solutions $(y↓1, y↓2)$ to the
congruence $y↓1 + ay↓2 ≡ 0 \modulo m$ have the form $(y↓1,
y↓2) = (x↓1u↓{11}\! +\!x↓2u↓{21}, x↓1u↓{12}\!+\!x↓2u↓{22})$ for integer
$x↓1$, $x↓2$.\xskip (b) If, in addition, $2 | u↓{11}u↓{21} + u↓{12}u↓{22}
| ≤ u↑{2}↓{11} + u↑{2}↓{12} ≤ u↑{2}↓{21} + u↑{2}↓{22}$, prove
that $(y↓1, y↓2) = (u↓{11}, u↓{12})$ minimizes $y↑{2}↓{1} +
y↑{2}↓{2}$ over all nonzero solutions to the congruence.
\exno 5. [M30] Prove that steps S1 through S3 of Algorithm S
correctly perform the spectral test in two dimensions.\xskip [{\sl
Hint:} See exercise 4, and prove that $(h↑\prime + h)↑2
+ (p↑\prime + p)↑2 ≥ h↑2 + p↑2$ at the beginning of step S2.]
\exno 6. [M30] Let $a↓0$, $a↓1$, $\ldotss$, $a↓{t-1}$ be the partial
quotients of $a/m$ as defined in Section 3.3.3, and let $A =
\max↓{0≤j<t}a↓j$. Prove that $\mu ↓2 > 2π/\biglp A + 1 + 1/A\bigrp$.
\exno 7. [HM22] Prove that ``question
(a)'' and ``question (b)'' of the text have the same solution
for real values of $q↓1$, $\ldotss$, $q↓{j-1}$, $q↓{j+1}$, $\ldotss$,
$q↓t$ $\biglp$cf.\ (24), (26)$\bigrp$.
\exno 8. [M16] Line 18
of Table 1 has a very low value of $\mu ↓2$, yet $\mu ↓3$ is
quite satisfactory. What is the highest possible value of $\mu
↓3$ when $\mu ↓2 = 10↑{-6}$ and $m = 10↑{10}$?
\exno 9. [HM32] (C. Hermite, 1846.)\xskip
Let $f(x↓1, \ldotss , x↓t)$ be a positive definite quadratic
form, defined by the matrix $U$ as in (17), and let $\theta$
be the minimum value of $f$ at nonzero integer points. Prove
that $\theta ≤ ({4\over 3})↑{(t-1)/2} |\det U | ↑{2/t}$.\xskip [{\sl Hints:}
If $W$ is any integer matrix of determinant 1, the matrix
$WU$ defines a form equivalent to $f$; and if $S$ is any orthogonal
matrix (i.e., $S↑{-1} = S↑T$), the matrix $US$ defines a form
identically equal to $f$. Show that there is an equivalent form
$g$ whose minimum $\theta$ occurs at $(1, 0, \ldotss , 0)$. Then
prove the general result by induction on $t$, writing $g(x↓1,
\ldotss , x↓t) = \theta (x↓1 + β↓2x↓2 + \cdots + β↓tx↓t)↑2 +
h(x↓2, \ldotss , x↓t)$ where $h$ is a positive definite quadratic
form in $t - 1$ variables.]
\exno 10. [M28] Let $(y↓1, y↓2)$ be relatively prime integers
such that $y↓1 + ay↓2 ≡ 0 \modulo m$ and $y↑{2}↓{1} + y↑{2}↓{2}
< \sqrt{4/3}\, m$. Show that there exist integers $(u↓1,
u↓2)$ such that $u↓1 + au↓2 ≡ 0 \modulo m$, $u↓1y↓2 - u↓2y↓1
= m$, $2|u↓1y↓1 + u↓2y↓2| ≤ \min(u↑{2}↓{1}\! +\! u↑{2}↓{2}, y↑{2}↓{1}
\!+\! y↑{2}↓{2})$, and $(u↑{2}↓{1} + u↑{2}↓{2})(y↑{2}↓{1} + y↑{2}↓{2})
≥ m↑2$. (Hence by exercise 4, $\nu ↑{2}↓{2} = \min(u↑{2}↓{1}
\!+\! u↑{2}↓{2}, y↑{2}↓{1} \!+\! y↑{2}↓{2})$.)
\trexno 11. [HM30] (Alan G. Waterman, 1974.)\xskip Invent a reasonably
efficient procedure that computes multipliers $a ≡ 1 \modulo
4$ for which there exists a relatively prime solution to the
congruence $y↓1 + ay↓2 ≡ 0 \modulo m$ with $y↑{2}↓{1} + y↑{2}↓{2}
= \sqrt{4/3}\,m - \epsilon $, where $\epsilon > 0$ is
as small as possible, given $m = 2↑e$. (By exercise 10, this
choice of $a$ will guarantee that $\nu ↑{2}↓{2} ≥ m↑2/(y↑{2}↓{1}
+ y↑{2}↓{2}) > \sqrt{3/4}\,m$, and there is a chance
that $\nu ↑{2}↓{2}$ will be near its optimum value
$\sqrt{4/3}\,m$. In practice we will compute several such multipliers
having small $\epsilon $, choosing the one with best spectral
values $\nu ↓2$, $\nu ↓3$, $\ldotss\,$.)
\exno 12. [HM23] Prove, without geometrical handwaving, that
any solution to the text's ``question (b)'' must also satisfy
the set of equations (26).
\exno 13. [HM22] Lemma A uses the fact that $U$ is nonsingular
to prove that a positive definite quadratic form attains a definite,
nonzero minimum value at nonzero integer points. Show that this
hypothesis is necessary, by exhibiting a quadratic form (19)
whose matrix of coefficients is singular, and for which the
values of $f(x↓1, \ldotss , x↓t)$ get arbitrarily near zero (but
never reach it) at nonzero integer points ($x↓1, \ldotss , x↓t)$.
\exno 14. [24] Perform Algorithm S by hand, for $m = 100$, $a
= 41$, $T = 3$.
\trexno 15. [M20] Let $U$ be an integer vector satisfying (15). How many of the
$(t - 1)$-dimensional hyperplanes defined by $U$ intersect
the unit hypercube $\leftset(x↓1, \ldotss , x↓t)\relv 0 ≤ x↓j < 1 \hjust{ for }
1 ≤ j ≤ t\rightset$? (This is approximately the number of hyperplanes
in the family which will suffice to cover $L↓0$.)
\exno 16. [M30] (U. Dieter.)\xskip Show how to modify Algorithm S
in order to calculate the minimum number $N↓t$ of parallel hyperplanes
intersecting the unit hypercube as in exercise 15, over all
$U$ satisfying (15).\xskip [{\sl Hint:} What are appropriate analogs to
positive definite quadratic forms and to Lemma A?]
\exno 17. [20] Modify Algorithm S so that, in addition to computing
the quantities $\nu ↓t$, it outputs all integer vectors $(u↓1,
\ldotss , u↓t)$ satisfying (15) such that $u↑{2}↓{1} + \cdots
+ u↑{2}↓{t} = \nu ↑{2}↓{t}$, for $2 ≤ t ≤ T$.
\exno 18. [M30] (a) Let $m = 2↑e$, where $e$ is even. By considering
``combinatorial matrices,'' i.e., matrices whose elements have
the form $y + x\delta ↓{ij}$ (cf.\ exercise 1.2.3--39), find
$3 \times 3$ matrices of integers $U$ and $V$ satisfying (29)
such that the transformation of step S5 does nothing for any
$j$, but the corresponding values of $z↓k$ in (31) are so huge
that exhaustive search is out of the question. (The matrix $U$
need not satisfy (28), we are interested here in {\sl arbitrary}
positive definite quadratic forms of determinant $m$.)\xskip (b) Although
transformation (23) is of no use for the matrices constructed
in (a), find another transformation which does produce a substantial
reduction.
\trexno 19. [HM25] Suppose step S5 were changed slightly, so that
a transformation with $q = 1$ would be performed when $2V↓i
\cdot V↓j = V↓j \cdot V↓j$. (Thus, $q = \lfloor (V↓i \cdot V↓j\;/\;V↓j
\cdot V↓j) + {1\over 2}\rfloor$ in all cases.) Would it be possible
for Algorithm S to get into an infinite loop?
\exno 20. [M21] Discuss how to carry out an appropriate spectral
test for linear congruential sequences having $c = 0$, $X↓0$ odd,
$m = 2↑e$, $a \mod 8 = 5$.
\exno 21. [M20] (R. W. Gosper.)\xskip A certain application uses random
numbers in batches of four, but ``throws away'' the second of
each set. How can we study the grid structure of $\left\{{1\over m}(X↓{4n},
X↓{4n+2}, X↓{4n+3})\right\}$, given a linear congruential generator
of period $m = 2↑e$?
\exno 22. [M46] What is the best upper bound on $\mu ↓3$, given
that $\mu ↓2$ is very near its maximum value $\sqrt{4/3}\,π$?
What is the best upper bound on $\mu ↓2$, given that $\mu ↓3$
is very near its maximum value ${4\over 3}π
\sqrt{2}$?
\exno 23. [M48] Let $U↓i$, $V↓j$ be vectors of real numbers with
$U↓i \cdot V↓j = \delta ↓{ij}$ for $1 ≤ i, j ≤ t$, and such
that $U↓i \cdot U↓i = 1$, $2|U↓i \cdot U↓j| ≤ 1$, $2|V↓i \cdot V↓j|
≤ V↓j \cdot V↓j$ for $i ≠ j$. How large can $V↓1 \cdot V↓1$
be? (This question relates to the bounds in step S8, if both
(23) and the transformation of exercise 18(b) fail to make any
reductions. The author suspects that $V↓1 \cdot V↓1 < 2$.)
\trexno 24. [M28] Generalize the spectral test to sequences of
the form $X↓n = (aX↓{n-1} + bX↓{n-2})\mod p$, having period
length $p↑2 - 1$. (Cf.\ Eq.\ 3.2.2--8.) How should Algorithm S
be modified?
\exno 25. [HM24] Let $d$ be a divisor of $m$ and let $0≤q<d$. Prove that
$\sum r(k)$, summed over all $0≤k<m$ such that $k\mod d=q$, is at most
$(2/dπ)\ln(m/d)+O(1)$. $\biglp$Here $r(k)$ is defined in Eq.\ (46) when
$t=1$.$\bigrp$
\exno 26. [M22] Explain why the derivation of (53) leads to a similar
bound on $$\lower15pt\null\left|\chop{9pt}{N↑{-1}\sum↓{0≤n<N}\omega↑{qx↓n}}\right|$$
for $0<q<m$. Where
does the derivation of (53) break down when $m=1$?
\exno 27. [HM39] (E. Hlawka, H. Niederreiter.)\xskip Let $r(u↓1,\ldotss,u↓m)$
be the function defined in (46). Prove that $\sum r(u↓1,\ldotss,u↓t)$, summed
over all $0≤u↓1,\ldotss,u↓t<m$ such that $(u↓1,\ldotss,u↓t)≠(0,\ldotss,0)$ and
(15) holds, is $O\biglp(2\lg m)↑t/\nu↓t\bigrp$.
\trexno 28. [M28] (H. Niederreiter.)\xskip Find an analog of Theorem N for the
case $m=$ prime, $c=0$, $a=$ primitive root modulo $m$,
$X↓0 \neqv 0\modulo m$.\xskip [{\sl Hint:} Your exponential sums should involve
$\zeta=e↑{2πi/(m-1)}$ as well as $\omega$.]\xskip Prove that in this case
the ``average'' primitive root has discrepancy $D↓{m-1}↑{(t)}=
O\left(t(\log m)↑t/\phi(m-1)\right)$, hence good primitive roots exist
for all $m$.
\vfill\eject